Speech recognition method remote controller, information terminal, telephone communication terminal and speech recognizer

ABSTRACT

A speech recognition method can be preferably applied to equipment for constantly performing speech recognition, converts speech into an acoustic parameter series, calculates for the acoustic parameter series the likelihood of a hidden Markov model  22  corresponding to the speech unit label series about a registered word and the likelihood of a virtual model  23  corresponding to the speech unit label series for recognition of speech other than the registered word, and performs speech recognition based on the likelihoods.

TECHNICAL FIELD

The present invention relates to a speech recognition method forcontrolling by speech an equipment unit available in a common livingenvironment, a remote controller, an information terminal, a telephonecommunication terminal, and a speech recognizer using the speechrecognition method.

BACKGROUND ART

In a conventional remote controller, an equipment unit requires oneremote controller, and it is common that the same remote controllercannot remotely control different equipment units. For example, a remotecontroller for a television cannot remotely control an air-conditioner.A remote controller is provided with a number of switches depending onthe operation contents to be controlled, and a control signal for atarget equipment unit is selected based on the press status of theswitches and transmitted to the target equipment unit. In the case of avideo tape recorder, etc., there are a number of necessary operationbuttons such as a button for selection of a desired television station,a button for designation of a time for reservation of a program, abutton for setting the running status of a recording tape, etc., and theoperations of the buttons are complicated. Furthermore, since a remotecontroller is required for each target equipment unit, the user has tocorrectly understand the correspondence between each remote controllerand its target equipment unit, which has been a very laboriousoperation.

A remote controller which aims at eliminating the above-mentioned largenumber of switches and controlling the operations of a plurality oftarget equipment units using only one remote controller has beendisclosed by, for example, Japanese Patent Laid-Open No. 2-171098. Inthe prior art, the remotely controlled contents are specified by speechinput, and a control signal is generated based on a speech recognitionresult. The speech recognition remote controller of the prior art has arewritable map for use in converting a speech recognition result into anequipment control code so that a plurality of target equipment units canbe operated, and the contents of the map are rewritten depending on theequipment unit to be operated. The map rewriting operation requireschanging an IC card storing the map of conversion codes for each targetequipment unit. When a target equipment unit is changed, a correspondingIC card is to be searched for.

In the speech recognition remote controller described in Japanese PatentLaid-Open No. 5-7385, a prohibition flag is stored for the operationcontents to be prohibited when they are generated based on the operationstatus of the equipment unit in the equipment status memory using acorrespondence table between equipment and word, and a correspondencetable between control signal and equipment status.

However, when a plurality of equipment units are controlled by a singleremote controller in the speech recognition technology, the number ofwords to be recognized increases. Therefore, the contents of inputspeech are not always correctly recognized, that is, recognized asdifferent contents from the designated contents, thereby causing amalfunction and reducing the features of the remote controller as aconvenient unit. Particularly, when an acoustic equipment unit such as atelevision, an audio device, etc., noise generated by a target equipmentunit can start a speech recognizing process, the equipment unit can beoperated without utterance of the user, or the utterance correctlyreferring to desired control contents can be misrecognized due to thenoise generated by the acoustic equipment, thereby requiring repeatedutterance many times.

For the speech recognition remote controller for controlling theabove-mentioned acoustic equipment, Japanese Patent Laid-Open No.57-208596 discloses means for improving the recognition rate of a speechrecognition circuit by muting the audio means of a television receiver,etc. when the utterance of the speech of a user is detected. JapanesePatent Laid-Open No. 10-282993 discloses the technology of improving thedetection of a speech command by enhancing the immunity to the error ina speech recognizing process by providing a sound compensator used incorrecting a microphone signal with an audio signal transmitted by anaudio equipment unit evaluated in the position of the speech inputdevice by modeling a transmission line in a space between a speaker anda microphone using a speech command input from a speech input device anda signal formed by an audio signal and other signals of backgroundnoise. In this case, when the speech recognition remote controller isused, a special circuit is to be provided for an instruction to performa muting process for a target equipment unit in advance, and specialknowledge such as adjusting the position and sensitivity of amicrophone, etc. is required. Therefore, there have been a problem for ageneral-purpose device.

Furthermore, with the speech recognition remote controller according tothe above-mentioned conventional technology, and with an increasingnumber of target equipment units to be controlled, there can be amalfunction due to the misrecognition by an unknown word, an unnecessaryword, and the utterance beyond the prediction of the system, etc.Therefore, to realize a speech recognition remote controller of a moreconvenient speech recognition type, the rejecting capability ofdetermining an incorrect recognition result and the utterance beyond theprediction of the system is demanded. Especially, in the status in whicha speech recognizing process is constantly performed, the noise causedon normal living conditions in a use environment, for example, theconversation among friends, the sound of the steps of the person walkingnear the remote controller, the utterance of pets, the noise made in thecooking operation in the kitchen, etc. cannot be eliminated by thecurrent speech recognition technology. As a result, there has been theproblem that misrecognition occurs frequently. If the allowance range ofthe matching determination with a registered word is strictly set toreduce the misrecognition, the misrecognition can actually be reduced,but a target word to be recognized can also be rejected frequently,thereby requiring repeated utterance and constituting a nuisance for auser.

The above-mentioned problem is not limited to the remote controller, butvarious speech recognition devices such as an information terminal, atelephone communication terminal, etc. have similar problems.

The present invention has been developed to solve the above-mentionedproblems with the conventional technology, and aims at providing aspeech recognition method applicable to equipment for constantlyperforming speech recognition with the misrecognition by noise caused onnormal living conditions reduced, a remote controller, an informationterminal, a telephone communication terminal, and a speech recognizerusing the speech recognition method.

DISCLOSURE OF INVENTION

To solve the above-mentioned problems, the present invention includesthe following configuration. That is, the speech recognition methodaccording to the present invention performs speech recognition byconverting input speech of a target person whose speech is to berecognized into an acoustic parameter series, and comparing using aViterbi algorithm the acoustic parameter series with the acoustic modelcorresponding to the speech unit label series about a registered word,provides parallel to a speech unit label series for the registered worda speech unit label series for recognition of an unnecessary word otherthan a registered word, and calculates also the likelihood of the speechunit label series for an unnecessary word other than the registered wordin the comparing process using the Viterbi algorithm, therebysuccessfully recognizing the unnecessary word as an unnecessary wordwhen it is input as input speech. That is, the speech is converted intoan acoustic parameter series for which the likelihood of the acousticmodel for recognizing a registered word corresponding to the speech unitlabel series about the registered word and the likelihood of theacoustic model for recognizing an unnecessary word corresponding to thespeech unit label series for recognition of the speech other than theregistered word are calculated. Based on the likelihoods, the speechrecognition is conducted.

With the above-mentioned configuration, if noise caused on normal livingconditions, etc. containing no registered words, that is, the speechother than a registered word, is converted into an acoustic parameterseries, then the likelihood of the acoustic model corresponding to thespeech unit label series about the registered word is calculated with asmall resultant value output while the likelihood of the acoustic modelcorresponding to the speech unit label series about the unnecessary wordis calculated with a large resultant value output. Based on theselikelihoods, the speech other than the registered word can be recognizedas an unnecessary word, thereby preventing the speech other than theregistered word from being misrecognized as a registered word.

The acoustic model corresponding to the speech unit label series can bean acoustic model using a hidden Markov model, and the speech unit labelseries for recognition of the unnecessary word can be a virtual speechunit model obtained by equalizing all available speech unit models. Thatis, the acoustic model for recognizing an unnecessary word can beconverged into a virtual speech unit model obtained by equalizing allspeech unit models.

With the above-mentioned configuration, when the speech containing aregistered word is converted into an acoustic parameter series, thelikelihood of the hidden Markov model corresponding to the speech unitlabel series about a registered word is calculated as larger than thelikelihood of the virtual speech unit model obtained by equalizing allspeech unit models for the acoustic parameter series. Based on thelikelihoods, a registered word contained in the speech can berecognized. When noise caused on normal living conditions, etc.containing no registered words, that is, the speech other than aregistered word, is converted into an acoustic parameter series, for theacoustic parameter series, the likelihood of a virtual speech unit modelobtained by equalizing all speech unit models is calculated as largerthan the likelihood of the hidden Markov model corresponding to thespeech unit label series about a registered word. Based on thelikelihoods, the speech other than the registered word can be recognizedas an unnecessary word, thereby preventing the speech other than theregistered word from being misrecognized as a registered word.

The acoustic model corresponding to the speech unit label series can bean acoustic model using a hidden Markov model, and the speech unit labelseries for recognition of the unnecessary word can have a self-loopnetwork formed by phonemes of vowels only. That is, the acoustic modelfor recognizing an unnecessary word can be a group of phoneme modelscorresponding to the phonemes of vowels, has a self-loop from the endpoint of the group to the starting point, calculates for the acousticparameter series the likelihood of the phoneme model group correspondingto the phonemes of vowels, and the maximum value is accumulated todetermine the likelihood of an unnecessary word model.

With the above-mentioned configuration, when the speech containing aregistered word is converted into an acoustic parameter series,depending on the existence of the phoneme of the consonant contained inthe acoustic parameter series, for the acoustic parameter series, thelikelihood of the hidden Markov model corresponding to the speech unitlabel series about a registered word is calculated as larger than thelikelihood of the self-loop network configured by the phonemes of vowelsonly. Based on the likelihood, the registered word contained in thespeech can be recognized. When the noise caused on normal livingconditions, etc., that is, the speech containing no registered words,that is, the speech other than a registered word, is converted into anacoustic parameter series, depending on the phoneme of a vowel containedin the acoustic parameter series and not contained in a registered word,the likelihood of the self-loop network configuration of the phoneme ofvowels only is calculated as larger than the likelihood of the memorycorresponding to the speech unit label series about a registered wordfor the acoustic parameter. Based on the likelihood, the speech otherthan the registered word can be recognized as an unnecessary word, andthe speech other than the registered word can be prevented from beingmisrecognized as a registered word.

On the other hand, to solve the above-mentioned problem, the remotecontroller according to the present invention can remotely control byspeech a plurality of operation targets, and includes: storage means forstoring a word to be recognized indicating a remote operation; means forinputting speech uttered by a user; speech recognition means forrecognizing the word to be recognized and contained in the speechuttered by the user using the storage means; and transmission means fortransmitting an equipment control signal corresponding to a word to berecognized and actually recognized by the speech recognition means, andthe speech recognition method is based on the speech recognition methodaccording to any of claims 1 to 3. That is, the remote controllerincludes: speech detection means for detecting the speech of a user;speech recognition means for recognizing a registered word contained inthe speech detected by the speech detection means; and transmissionmeans for transmitting an equipment control signal corresponding to theregistered word recognized by the speech recognition means. The speechrecognition means recognizes a registered word contained in the speechdetected by the speech detection means in the speech recognition methodaccording to any of claims 1 to 3.

With the above-mentioned configuration, when the noise caused on normalliving conditions, etc. which is speech containing no registered words,that is, speech other than a registered word, is uttered by a user, thelikelihood of an acoustic model corresponding to the speech unit labelseries about an unnecessary word is calculated with a large resultantvalue output for the acoustic parameter series of the speech while thelikelihood of the acoustic model corresponding to the speech unit labelseries about the registered word is calculated with a small resultantvalue output. Based on the likelihoods, the speech other than theregistered word can be recognized as an unnecessary word, the speechother than the registered word can be prevented from being misrecognizedas a registered word, and a malfunction of the remote controller can beavoided.

The remote controller also includes a speech input unit for allowing auser to perform communications, and a communications unit forcontrolling the setting state to the communications line based on theword to be recognized by the speech recognition means, and the speechinput means and the speech input unit of the communications unit can beseparately provided.

With the above-mentioned configuration, although a user is communicatingwith a partner and the communications occupy the speech input unit ofthe communications unit, the speech of the user can be input to thespeech recognition means and the communications unit can be controlled.

The remote controller can also include control means for performing atleast one of a process of transmitting and receiving mail by speech, aprocess of managing a schedule by speech, the memo processing by speech,and a notifying process by speech.

With the above-mentioned configuration, a user can perform the processof transmitting and receiving mail by speech, the process of managing aschedule by speech, the memo processing by speech, and the notifyingprocess by speech by only uttering a registered word without anyphysical operation.

To solve the above-mentioned problem, the information terminal accordingto the present invention includes: speech detection means for detectingthe speech of a user; speech recognition means for recognizing aregistered word contained in the speech detected by the speech detectionmeans; and control means for performing at least one of the speechrecognizing process, the process of managing a schedule by speech, thememo processing by speech, and the notifying process by speech. Thespeech recognition means can recognize a registered word contained inthe speech detected by the speech detection means in the speechrecognition method according to any of claims 1 to 3. The process oftransmitting and receiving mail by speech can be performed by, forexample, a user inputting by speech the contents of mail, converting thespeech into speech data, transmitting the speech data by attaching it toelectronic mail, receiving the electronic mail to which the speech datais attached, and regenerating the speech data. The process of managing aschedule by speech can be performed by, for example, a user input byspeech the contents of a schedule, converting the speech into speechdata, inputting the execution day of the schedule, and managing theschedule with the speech data associated with the execution day. Thememo processing by speech can be performed by, for example, a user inputby speech the contents of a memo, converting the speech into speechdata, and regenerating speech data at a request of the user. Thenotifying process by speech can be performed by, for example, a userinputting the contents of a notice, converting the speech into speechdata, inputting a notice timing, and regenerating the speech data at thenotice timing.

With the configuration, when noise caused on normal living conditions,etc. that is, speech containing no registered words, that is, speechother than a registered word, is uttered by a user, the likelihood ofthe acoustic model corresponding to the speech unit label series aboutan unnecessary word is calculated as larger than the acoustic parameterseries of the speech while the likelihood of the acoustic modelcorresponding to the speech unit label series about the registered wordis calculated as smaller. Based on the likelihoods the speech other thanthe registered word can be recognized as an unnecessary word, therebypreventing the speech other than the registered word from beingmisrecognized as a registered word, and suppressing a malfunction of aninformation terminal. Furthermore, the user can perform the process oftransmitting and receiving mail by speech, the process of managing aschedule by speech, the memo processing by speech, and the notifyingprocess by speech only by uttering a registered word without a physicaloperation.

On the other hand, to solve the above-mentioned problem, the telephonecommunication terminal according to the present invention can beconnected to a public telephone line network or an Internetcommunications network, and includes: speech input/output means forinputting and outputting speech; speech recognition means forrecognizing input speech; storage means for storing personal informationincluding the name and phone number of a communication partner, screendisplay means; and control means for controlling each means. The speechinput/output means has the respective and independent input/outputsystems in the communications unit and the speech recognition unit. Thatis, the terminal includes speech input unit for allowing a user to inputby speech a registered word relating to a telephone operation; a speechrecognition unit for recognizing the registered word input through thespeech input unit, and a communications unit, having a speech input unitfor allowing a user to perform communications, for controlling theconnection status to a communications line according to the registeredword recognized by the speech recognition unit. The speech input unit ofthe speech recognition unit and the speech input unit of thecommunications unit are individually provided.

With the above-mentioned configuration, although a user is communicatingwith a partner and the communications occupy the input/output system ofthe communications unit, the speech of the user can be input to thespeech recognition unit, and the communications unit can be controlled.

Additionally, to solve the above-mentioned problem, the telephonecommunication terminal according to the present invention can beconnected to a public telephone line network or an Internetcommunications network, and includes: speech input/output means forinputting and outputting speech; speech recognition means forrecognizing input speech; storage means for storing personal informationincluding the name and phone number of a communication partner; screendisplay means; and control means for controlling each means. The storagemeans separately stores a name vocabulary list of specific namesincluding the name of a person registered in advance; a numbervocabulary list of arbitrary phone numbers; a telephone call operationvocabulary list of telephone operations during communications; and acall receiving operation vocabulary list of telephone operations for anincoming call. All telephone operations relating to an outgoing call, adisconnection, and an incoming call can be performed by the speechrecognition means, the storage means, and the control means by input ofspeech. That is, the storage means individually stores a name vocabularylist in which specific names are registered, a number vocabulary list inwhich arbitrary phone numbers are registered, a telephone call operationvocabulary list in which words related to telephone operations duringthe communications are registered, and a call receiving operationvocabulary list in which words related to telephone operations areregistered when an incoming call is received. The speech recognitionmeans selects a vocabulary list stored in the storage means depending onthe recognition result by the speech recognition means or the status ofthe communications line, refers to the vocabulary list, and recognizesthe word contained in the speech input through the speech input/outputmeans.

With the above-mentioned configuration, the vocabulary list can bechanged into an appropriate list depending on the situation, therebypreventing an occurrence of misrecognition by noise caused on normalliving conditions, etc. which is unnecessary speech.

The method of recognizing a phone number can also be realized byrecognizing a number string pattern formed by a predetermined number ofdigits or symbols using a number vocabulary list of the storage meansand the phone number vocabulary network for recognition of an arbitraryphone number by the speech recognition method by inputting all number ofdigits of continuous utterance. That is, the storage means stores aserial number vocabulary list in which number strings corresponding toall digits of phone numbers are registered, and the speech recognitionmeans can refer to the serial number vocabulary list stored in thestorage means when a phone number contained in the input speech isrecognized.

With the above-mentioned configuration, when a phone number is to berecognized, the user only has to continuously utter a number stringcorresponding to the entire digits of the phone number, therebyrecognizing the phone number in a short time.

The screen display means can have the utterance timing display functionof announcing an utterance timing. That is, it can be announced that thespeech recognition means in the status of possibly recognizing aregistered word.

With the configuration, by uttering a word with an utterance timingannounced by the screen display means, a user can utter a registeredword with an appropriate timing, thereby appropriately recognizing theregistered word.

Second control means for performing at least one of the process oftransmitting and receiving mail by speech, the process of managing aschedule by speech, the memo processing by speech, and the notifyingprocess by speech can be provided based on the input speech recognizedby the speech recognition means.

With the configuration, a user can perform the process of transmittingand receiving mail by speech, the process of managing a schedule byspeech, the memo processing by speech, and the notifying process byspeech only by uttering a registered word without a physical operation.

The speech recognition means can recognize a registered word containedin input speech in the speech recognition method according to any ofclaims 1, 2, and 3.

With the above-mentioned configuration, when a user utters noise causedon normal living conditions, etc. containing no registered words, thatis, speech other than a registered word, the likelihood of an acousticmodel corresponding to the speech unit label series about an unnecessaryword is calculated as a large value for the acoustic parameter series ofthe speech while the likelihood of the acoustic model corresponding tothe speech unit label series about a registered word is calculated as asmall value. Based on the likelihoods, the speech other than theregistered word is recognized as an unnecessary word, thereby preventingthe speech other than the registered word from being misrecognized as aregistered word, and avoiding a malfunction of the telephonecommunication terminal.

On the other hand, to solve the above-mentioned problem, the speechrecognizer according to the present invention includes: speech detectionmeans for detecting the speech of a user; speech recognition means forrecognizing a registered word contained in the speech detected by thespeech detection means; and utterance timing notice means for announcingthat the speech detection means is in a status in which the means canrecognize a registered word.

With the above-mentioned configuration, by uttering speech when thestatus of recognizing a registered word is announced, a user can utter aregistered word with an appropriate timing, thereby easily recognizing aregistered word.

Volume notice means for announcing the volume of speech detected by thespeech detection means can also be provided.

With the above-mentioned, a user can be helped in uttering a word at anappropriate volume, thereby easily recognizing a registered word.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram of the remote controller according to thefirst embodiment of the present invention;

FIG. 2 shows a rough configuration of the remote controller shown inFIG. 1;

FIG. 3 is a flowchart of the arithmetic process performed by the remotecontroller shown in 2;

FIG. 4 is an explanatory view of an image displayed on the LCD displaydevice in the arithmetic process shown in FIG. 3;

FIG. 5 is an explanatory view of a speech recognizing process performedin the arithmetic process shown in FIG. 3;

FIG. 6 is an explanatory view of a vocabulary network used in the speechrecognizing process shown in FIG. 5;

FIG. 7 is an explanatory view of a vocabulary network in which theunnecessary word model shown in FIG. 6 is a virtual phoneme modelobtained by equalizing all phoneme models;

FIG. 8 is an explanatory view of a vocabulary network in which theunnecessary word model shown in FIG. 6 is a self-loop phonemes formingvowels;

FIG. 9 is an explanatory view of a vocabulary network in which theunnecessary word model shown in FIG. 6 is a combination of a virtualphoneme model obtain by equalizing all phoneme models and a self-loopphonemes forming vowels;

FIG. 10 is an explanatory view of a vocabulary network in which theunnecessary word model shown in FIG. 6 is a group of phonemes formingvowels;

FIG. 11 is an explanatory view of a vocabulary network without anunnecessary word model;

FIG. 12 is a block diagram of the information terminal according to thesecond embodiment of the present invention;

FIG. 13 shows a rough configuration of the information terminal shown inFIG. 12;

FIG. 14 is a flowchart of the arithmetic process performed by theinformation terminal shown in FIG. 13;

FIG. 15 is an explanatory view of an image displayed on the LCD displaydevice in the arithmetic process shown in FIG. 14;

FIG. 16 is a flowchart of the arithmetic process performed by theinformation terminal shown in FIG. 13;

FIG. 17 is a flowchart of the arithmetic process performed by theinformation terminal shown in FIG. 13;

FIG. 18 is an explanatory view of an image displayed on the LCD displaydevice in the arithmetic process shown in FIG. 17;

FIG. 19 is an explanatory view of an image displayed on the LCD displaydevice in the arithmetic process shown in FIG. 17;

FIG. 20 is a flowchart of the arithmetic process performed by theinformation terminal shown in FIG. 13;

FIG. 21 is an explanatory view of an image displayed on the LCD displaydevice in the arithmetic process shown in FIG. 20;

FIG. 22 is a flowchart of the arithmetic process performed by theinformation terminal shown in FIG. 13;

FIG. 23 is a block diagram of a telephone communication terminal havinga speech recognizing function according to the third embodiment of thepresent invention;

FIG. 24 is a block diagram of a telephone communication terminal havinga speech recognizing function as a variation of the third embodiment ofthe present invention;

FIG. 25 is a flowchart of the arithmetic process performed by thecentral control circuit shown in FIG. 23;

FIG. 26 is an explanatory view of an image displayed on the LCD displaydevice in the arithmetic process shown in FIG. 25;

FIG. 27 is a flowchart of the arithmetic process performed by thecentral control circuit shown in FIG. 23;

FIG. 28 is an explanatory view of an image displayed on the LCD displaydevice in the arithmetic process shown in FIG. 27;

FIG. 29 is a flowchart of the arithmetic process performed by thecentral control circuit shown in FIG. 23; and

FIG. 30 is a flowchart of the arithmetic process performed by thecentral control circuit shown in FIG. 23.

BEST MODE FOR CARRYING OUT THE INVENTION

The embodiments of the present invention are described below byreferring to the attached drawings. FIG. 1 is a primary block diagram ofthe remote controller according to the first embodiment of the presentinvention. The remote controller shown in FIG. 1 comprises the body ofthe remote controller for recognition of the speech of a user, that is,a remote controller body 1, and an infrared emitting unit 2 for issuinga control signal as an infrared signal based on the recognition result.The speech of the user is input from the speech input device (microphone3) of the remote controller body 1, transmitted through an amplifier 4,and converted by an A/D converter 5 into a digitized acoustic parameter(for example, a spectrum, etc.). The input analog speech is notdesignated, but is normally sampled and digitized at a specificfrequency in the range from 8 KHz to 16 KHz. The likelihood of thedigitized acoustic parameter is calculated relative to the acousticparameter for each speech unit which is a configuration unit of eachword for the registered vocabulary list stored and registered in speechinstruction information memory 7, in a speech instruction recognitioncircuit 6, thereby extracting the most likely word from the registeredvocabulary list. That is, in the speech instruction recognition circuit6, the likelihood of a word (hereinafter referred to as a registeredword) in the registered vocabulary list and stored and registered in thespeech instruction information memory 7 for the digitized acousticparameter is calculated for each configuration unit (hereinafterreferred to as a speech unit) in the speech instruction recognitioncircuit 6, and the largest accumulation value of the likelihood isextracted as the registered word closest to the speech of the user. Inthe speech instruction recognition circuit 6, the likelihood of theunnecessary word model stored and registered in the speech instructioninformation memory 7 is simultaneously calculated for the digitizedacoustic parameter. When the likelihood of the unnecessary word model ishigher than the likelihood of the registered word, it is assumed that noregistered word has been extracted from the digitized acousticparameter.

A speech unit can be a syllable, a phoneme, a semisyllable, a diphone(two pairs of phoneme), a triphone (three pairs of phoneme), etc., butdescribed below is the case in which a phoneme is used as a speech unitfor easier explanation.

In the speech instruction information memory 7, a control codecorresponding to each registered word is stored, the control codecorresponding to a registered word extracted by the speech instructionrecognition circuit 6, that is, recognized by speech, is called from thespeech instruction information memory 7, and transmitted through acentral control circuit 8 to an IRED drive control circuit 9 of theinfrared emitting unit 2. The IRED drive control circuit 9 calls an IREDcode corresponding to the control code from an IRED code informationmemory 10, and issues it as an infrared signal from an IRED 11.

At this time, means for simultaneously notifying a user of a speechrecognition result visually announces a recognition result by displayingit on an LCD display device 12, transmits the recognition result to aresponse speech control circuit 13, calls response speech datacorresponding to the recognition result from a response speechinformation memory 14, and audially notifies a user from a speaker 17 asan analog speech through a D/A converter 15 and an amplifier 16.

The infrared emitting unit 2 is provided with a photosensor 18, and whenit is necessary to use an infrared code not registered in the IRED codeinformation memory 10, the infrared code can be added to the IRED codeinformation memory 10 through a photosensor interface circuit 19 byissuing an infrared code to be used to the photosensor 18.

The hardware to be used is not specifically limited if it has the basicfunction as shown in FIG. 1. In the descriptions below, a generallymarketed personal computer used as the remote controller body 1 as shownin FIG. 2 is explained. FIG. 3 is a flowchart of the arithmetic processperformed by the speech recognition remote controller shown in FIG. 2,and transmitting an infrared code depending on the speech of a user. Inthe flowchart, a step for communications is not set, but the informationobtained in the arithmetic process is updated and stored in the storagedevice, and necessary information is read from the storage device at anytime. The arithmetic process is a process performed when the remotecontroller is started. In step S1, the speech detected by the microphone3 is read, and the speech recognizing process of recognizing asdescribed later whether the speech contains a starting password as aregistered word, or the noise and speech other than the startingpassword, that is, an unnecessary word only, is performed. That is, byinputting by speech a starting password, it is notified that a personwho wants to operate the remote controller is at the remote controller.A starting password can be arbitrarily set in advance using a userfavorite word, the speech of the user, etc. However, when the speechrecognition function is constantly operated, it is necessary to protectthe remote controller against a malfunction due to the noise caused onnormal living conditions read by the microphone 3. Therefore, a word notgenerally used, etc. is preferable. It is desired that a word havingthree or more, and less than 20 syllables is used. Furthermore, it isdesired that a word configured by five or more and fifteen and lesssyllables is used. For example, a word such as “open sesame”, etc. isacceptable.

Then, in step S2, it is determined whether or not it has been recognizedin step S1 that the starting password is contained in the speech. If thestarting password is contained (YES), then control is passed to step S3,otherwise (if NO), control is passed to step S1 again. Therefore, if aword other than a starting password, that is, only noise and speechcontaining no starting password are input from the microphone 3, theyare recognized as unnecessary words, and it is assumed that there is nouser around, and the system enters a status in which input speech isawaited.

In step S3, the speech detected by the microphone 3 is read, and thespeech recognizing process of recognizing as described later whether thespeech contains the name of target equipment as a registered word, orthe noise and speech other than the name of the target equipment, thatis, an unnecessary word only, is performed. There are words (registeredwords) for selection of equipment and function such as target equipmentcan be a “TV”, a “video”, an “air-con”, an “audio”, a “light”, a“curtain”, a “telephone”, a “timer”, an “electronic mail”, a “speechmemo”, etc. If a word other than a registered word, that is, if onlywords or noise not containing registered words are input, they arerecognized as unnecessary words, and the system enters a status in whichthe name of new target equipment is awaited.

In step S4, it is determined whether or not the name of target equipmentis contained in the speech. If the name of target equipment is contained(YES), then control is passed to step S6. Otherwise, (NO), control ispassed to step S3 again. Therefore, if it is recognized that the speechdetected by the microphone 3 contains a starting password, a mode inwhich a user selects target equipment is entered, and the system entersa status in which speech input is awaited until the name of targetequipment, etc. is input. If no registered word to be recognized isinput by speech although a predetermined time has passed, control isreturned to the mode in which a starting pass word is recognized (stepsS1 and S2) (not shown in FIG. 3), and the system enters a status inwhich speech input is awaited until a starting password is input, thatis, a standby status.

In step S6, the speech detected by the microphone 3 is read, and thespeech recognizing process of recognizing, as described later, whetherthe speech contains the instruction contents for target equipment as aregistered word, or the noise and speech other than the instructioncontents for target equipment, that is, an unnecessary word only, isperformed. That is, when the user selects target equipment, a mode inwhich the instruction contents of the target equipment can be controlledis entered. For example, when a “TV” is selected as target equipment, animage about the operations of television is displayed on the LCD displaydevice 12 as shown in FIG. 4, and a mode in which a power on/offoperation, selection of a channel number, selection of a broadcastingstation, adjustment of volume, etc. can be specified is entered.

Then, in step S7, it is determined whether or not it has been recognizedin step S6 that the instruction contents of the target equipment havebeen contained in the speech. If the instruction contents of the targetequipment are contained (YES), then control is passed to step S8.Otherwise (NO), control is passed to step S6 again. That is, the systementers a status in which input of controllable instruction contents isawaited.

Then, in step S8, the infrared code corresponding to the instructioncontents recognized in step S6 is transmitted to the infrared emittingunit 2. That is, when the instruction contents are input by speech, acorresponding infrared code is called based on the recognition result ofthe instruction contents, and the infrared code is transmitted from theinfrared emitting unit 2 to the target equipment. In this mode, when aninstruction and noise other than the controllable instruction contentsare input, they are recognized as unnecessary words.

In step S9, it is determined whether or not the instruction contentsrecognized in step S6 indicates the end (for example, “terminate”). Ifthey indicate the end (YES), then the arithmetic process is terminated.Otherwise (NO), control is passed to step S3. That is, if a controlinstruction indicating an end, for example, “terminate” is input byspeech in this mode, control is returned to the mode in which acontrollable target equipment is selected (steps S3 and S4). Although aregistered word relating to equipment control for recognition, that is,a control instruction, is not input by speech after a predeterminedtime, control is returned to the mode in which the target equipment isselected (not shown in FIG. 3).

In step S9, it is determined whether or not the instruction contentsrecognized in step S6 indicate standby (for example, “standby”). If theword indicates “standby” (YES), then control is passed to step S1.Otherwise (NO), control is passed to step S10. That is, if a word of aninstruction to queue the speech recognition remote controller, forexample, “standby” is input by speech in the mode in which the targetequipment is selected, then control is returned to a password receptionmode.

In step S10, it is determined whether or not the instruction contentsrecognized in step S6 indicates a word referring to apower-off status(for example, “close sesame”) If it is a word indicating the off status(YES), then the arithmetic process terminates. Otherwise (NO), controlis passed to step S10. That is, if a user input “close sesame” byspeech, the speech recognizer itself can be powered off, therebycompletely terminating the system.

When the system is resumed, and the operation system of the centralcontrol circuit 8 is activated, the application software relating to thesystem is to be activated only. When the operation system is suspended,the activation can be performed by physically pressing the power buttonof the system.

FIG. 5 shows the principle of the process using a hidden Markov model(hereinafter referred to as an HMM for short) in the speech recognizingprocesses performed in steps S1, S3, and S₆ shown in FIG. 3. When thespeech recognizing process is performed, first the speech detected bythe microphone 3 is converted into a digitized spectrum in a Fouriertransform or a wavelet transformation, and the speech data ischaracterized using a speech modeling method such as a linearpredication analysis, a cepstrum analysis, etc. on the spectrum. Then,for the characterized speech data, the likelihood of an acoustic model21 of each word registered in a vocabulary network 20 read in the speechrecognizing process in advance is calculated using the Viterbialgorithm. The registered word is modeled in a serial connection networkof the HMM corresponding to a serial connection (speech unit labelseries) in a speech unit, and the vocabulary network 20 is modeled as aserial connection network corresponding to a registered word groupregistered in the registered vocabulary list. Each registered word isconfigured in a speech unit of a phoneme, etc., and the likelihood iscalculated for each speech unit. When the termination of utterance of auser is checked, the registered word having the largest accumulationvalue of likelihood is detected from the registered vocabulary list, andthe registered word is output as a registered word recognized ascontained in the speech.

In the present invention, as shown in FIG. 6, a virtual model 23 forrecognition of an unnecessary word is set together with a vocabularynetwork 22 of registered words as in the HMM in the representation of aword. As the virtual model 23 for recognition of an unnecessary word, agarbage model method proposed by H. Boulard, B. D'hoore and J. M. Boite,“Optimizing Recognition and Rejection Performance in WordspottingSystems,” Proc. ICASSP, Adelaide, Australia, pp.I-373-376, 1994, etc.Thus, when an object other than a word to be controlled, that is,utterance and noise containing no registered words is input as speech,the likelihood of a virtual model corresponding to the unnecessary wordis set larger than the likelihoods of all registered words, therebyselecting a virtual model having the largest likelihood, andsuccessfully constructing a system capable of correctly determining theinput of an unnecessary word. Since the virtual model 23 for recognitionof an unnecessary word is used, a small portable remote controller canbe formed without increasing the calculation load at a practical levelof recognizing process although a rejection capability is assigned.

In the conventional method using only the vocabulary network 20 simplyformed by the vocabulary network 22 of registered words without thevirtual model 23 for recognition of an unnecessary word, there cannecessarily be a malfunction due to an unknown word and an unnecessaryword other than a word to be recognized or misrecognition from theutterance beyond the prediction of the system. Especially, in the statusin which a speech recognizing process is constantly performed, there canbe the problem that misrecognition frequently occurs by the noise causedon normal living conditions in a use environment, for example, theconversation among friends, the sound of the steps of the person walkingnear the remote controller, the utterance of pets, etc., the noise madein the cooking operation in the kitchen, etc. If the allowance range ofthe matching determination with a registered word is strictly set toreduce the misrecognition, the misrecognition can actually be reduced,but a target word to be recognized can also be rejected frequently,thereby requiring repeated utterance and constituting a nuisance for auser. Furthermore, there can be a method of listing unnecessary words inthe registered vocabulary list, but it is not practical to list allunnecessary words because the resultant registered vocabulary list istoo large and the required amount of calculation is extravagant.

FIG. 6 shows a vocabulary network of the names of target equipment inthe speech recognizing process performed in step S4 shown in FIG. 3. Thevocabulary network 20 represents registered words for selection oftarget equipment, that is, the names 22 of the target equipment and theunnecessary word model 23. In more detail, each registered word isconfigured as shown in FIG. 7 representing a corresponding phoneme labelseries. The unnecessary word model 23 is formed as a virtual phonememodel obtained by equalizing all phoneme models, and has the topologysimilar to those of the phoneme HMM models of the speech of generalpeople. The virtual phoneme model obtained by equalizing all availablephonemes is generated as follows. That is, a model is generated usingall phonemes as an HMM, the HMM is formed as a plurality of statustransition series, and each status is formed by a mixed Gaussiandistribution. Then, a set of Gaussian distribution to be shared amongphonemes is selected from the mixed Gaussian distribution, an amendmentis made to the mixed Gaussian distribution with a weight for eachphoneme, and a virtual phoneme model is obtained by equalizing allavailable phonemes. The virtual phoneme model with all available phonemeequalized is not limited to a product from one cluster, all speech unitsare divided into a plurality of (for example, three to five units)clusters, and a model can be formed from among the clusters. Therefore,when a registered word is uttered by a user, the likelihood of theregistered word is necessarily large. However, when a word other than aregistered word is uttered, the likelihood of a virtual phoneme modelbecomes larger as a result, thereby enhancing the probability ofrecognition as an unnecessary word. For example, if the word “takibi”which is not described in the vocabulary network 22 of registered wordsshown in FIG. 7 is input when the names of target equipment asregistered words such as “TV”, “video”, “air-con”, “light”, “audio”,etc. are registered, and if there is no unnecessary word models set,then the likelihood of a described word, that is, a word having asimilar phoneme configuration among the registered words (for example,“terebi” in the registered vocabulary list shown in FIG. 7) is thelargest and causes misrecognition. However, if an unnecessary word isset, there is the strong possibility that the likelihood of the virtualphoneme model is the largest according to the probability theory, andthe recognition as an unnecessary word can reduce the misrecognition toa large extent.

The unnecessary word model shown in FIG. 8 shows a self-loop of phonemesforming vowels. That is, the unnecessary word model is a set of HMMscorresponding to the phonemes of vowels, and has a self-loop from theend point to the starting point of the set. The likelihoods of the HMMscorresponding to the phonemes of vowels are calculated for each acousticparameter for the digitized acoustic parameter series, the largestvalues are accumulated, and the likelihood of an unnecessary word modelis obtained. This is based on the characteristic that almost all wordscontain vowels, and the analysis of phonemes can be represented byconsonants, vowels, friction sounds, explosives, etc. with largeracoustic energy assigned to vowels. That is, the likelihood of anunnecessary word model is calculated as continuous sounds of vowels ofall words. Therefore, when a registered word is uttered by a user, thephonemes other than vowels such as consonants become unfit for anunnecessary word model. Therefore, the likelihood of an unnecessary wordmodel is lower than the likelihood of a registered word, and as aresult, the probability of recognition as a registered word is enhanced.However, when a word other than a registered word is uttered, a phonememodel corresponding to a registered word indicates a lower value for thephoneme other than a vowel such as a consonant, etc. Therefore, thelikelihood of an unnecessary word model indicating continuous sounds ofvowels is higher and the probability of recognition as an unnecessaryword is high, thereby reducing misrecognition. This method is used whenit is hard to obtain the label series of the above-mentioned virtualphoneme model, and when existing speech recognition software formed byphoneme models is used.

Depending on the actual use situation, when the unnecessary wordrecognition rate is low and when the recognition rate is too high and atarget instruction word can be recognized as an unnecessary word, theoptimization of a recognition rate can be performed by multiplying thelikelihood obtained for an unnecessary word model by a virtual phonememodel and an unnecessary word model using vowel phonemes by anappropriate factor.

(Embodiment 1)

Described below is the first embodiment of the present invention.

In this embodiment, as shown in FIG. 7, the virtual phoneme model 23obtained by equalizing all phoneme models is provided as an unnecessaryword model. The phoneme model 23 and the registered word list describedin the table 1, that is, the vocabulary network 22 of registered wordsare provided in parallel in the vocabulary network 20. The vocabularynetwork 20 is read in the speech recognizing process in step S3 shown inFIG. 3 for a speech remote controller. As unnecessary words, “takibi”,“takeo”, and “fami-com” are input by speech five times for each word. Asa result, the probability of recognition as an unnecessary word, thatis, the probability of correct recognition as no registered word is100%. To check the recognition rate of a target word, that is, aregistered word such as “terebi”, “bideo”, “eakon”, “shoumei”, and“oodeo”, each word is uttered ten times, and the resultant correctrecognition rate for all these words is 94%. TABLE 1 Target vocabularyPhoneme representation Terebi

Bideo

Eakon

Shoumei

Oodeo

(Embodiment 2)

Described below is the second embodiment of the present invention.

In this embodiment, as shown in FIG. 8, the self-loop model 23′configured by HMMs corresponding to the phonemes of vowels, that is,“a”, “i”, “u”, “e”, and “o” are provided as unnecessary word models. Theself-loop model 23′ and the registered word list described in the table1, that is, the vocabulary network 22 of registered words are providedin parallel in the vocabulary network 20. The vocabulary network 20 isread in the speech recognizing process in step S3 shown in FIG. 3 for aspeech remote controller. As unnecessary words, “takibi”, “takeo”, and“fami-com” are input by speech five times for each word. As a result,the probability of recognition as an unnecessary word, that is, theprobability of correct recognition as no registered word is 100%. Tocheck the recognition rate of a target word, that is, a registered wordsuch as “terebi”, “bideo”, “eakon”, “shoumei”, and “oodeo”, each word isuttered ten times, and the resultant correct recognition rate for allthese words is 90%.

(Embodiment 3)

Described below is the third embodiment of the present invention.

In this embodiment, as in the first embodiment as shown in FIG. 7, thevirtual phoneme model 23 obtained by equalizing all phoneme models isprovided as an unnecessary word model. The phoneme model 23 and theregistered word list described in the table 1, that is, the vocabularynetwork 23 of registered words are provided in parallel in thevocabulary network 20. The vocabulary network 20 is read in the speechrecognizing process routine in step S3 shown in FIG. 3 for a speechremote controller. As unnecessary words, “a, i, u, e, o”, “eeto”,“keibi”, “ehen”, “shouchi” and “oodekoron” are input by speech ten timesfor each word. As a result, the probability of recognition as anunnecessary word, that is, the probability of correct recognition as noregistered word is 92%.

(Embodiment 4)

Described below is the fourth embodiment of the present invention.

In this embodiment, as in the second embodiment as shown in FIG. 8, theself-loop model 23′ configured by HMMs corresponding to the phonemes ofvowels, that is, “a”, “i”, “u”, “e”, and “o” are provided as unnecessaryword models. The self-loop model 23′ and the registered word listdescribed in the table 1, that is, the vocabulary network 22 ofregistered words are provided in parallel in the vocabulary network 20.The vocabulary network 20 is read in the speech recognizing process instep S3 shown in FIG. 3 for a speech remote controller. As unnecessarywords, “a, i, u, e, o”, “eeto”, “keibi”, “ehen”, “shouchi” and“oodekoron” are input by speech ten times for each word. As a result,the probability of recognition as an unnecessary word, that is, theprobability of correct recognition as no registered word is 93%.

(Embodiment 5)

Described below is the fifth embodiment of the present invention.

In this embodiment, as shown in FIG. 9, the virtual phoneme model 23obtained by equalizing all phoneme models, and the self-loop model 231configured by HMMs corresponding to the phonemes of “a”, “i”, “u”, “e”,and “o” are provided as unnecessary word models. The models 22 and 23and the registered word list described in the table 1, that is, thevocabulary network 22 of registered words are provided in parallel inthe vocabulary network 20. The vocabulary network 20 is read in thespeech recognizing process routine in step S103 shown in FIG. 3 for aspeech remote controller. As unnecessary words, “a, i, u, e, o”, “eeto”,“keibi”, “ehen”, “shouchi” and “oodekoron” are input by speech ten timesfor each word. As a result, the probability of recognition as anunnecessary word, that is, the probability of correct recognition as noregistered word is 100%. To check the recognition rate of a target word,that is, a registered word such as “terebi”, “bideo”, “eakon”,“shoumei”, and “oodeo”, each word is uttered ten times, and theresultant correct recognition rate for all these words is 88%.

(Embodiment 6)

Described below is the sixth embodiment of the present invention.

In this embodiment, as shown in FIG. 10, HMMs 23″ corresponding to thephonemes of “a”, “i”, “u”, “e”, and “o”, that is, the unnecessary wordmodels shown in FIG. 8 excluding the self-loop are provided asunnecessary word models. The self-loop model 23″ and the registered wordlist described in the table 1, that is, the vocabulary network 22 ofregistered words are provided in parallel in the vocabulary network 20.The vocabulary network 20 is read in the speech recognizing process instep S3 shown in FIG. 3 for a speech remote controller. As unnecessarywords, “a, i, u, e, o”, “eeto”, “keibi”, “ehen”, “shouchi” and“oodekoron” are input by speech ten times for each word. As a result,the probability of recognition as an unnecessary word, that is, theprobability of correct recognition as no registered word is 23%.

COMPARATIVE EXAMPLE 1

Described below is the first comparative example according to thepresent invention.

In this comparative example, as shown in FIG. 10, the vocabulary network20 configured by the registered word list described in the table 1, thatis, the vocabulary network 22 of registered words without using avirtual model for recognition of an unnecessary word model is read tothe speech recognizing process routine in step S3 shown in FIG. 3 toprepared the speech recognition remote controller. Then, as unnecessarywords, “takibi”, “takeo”, and “famikon” are input by speech five timesfor each word. As a result, “takibi” is completely misrecognized as“terebi”, “takeo” is completely misrecognized as “bideo”, and “fami-com”is completely misrecognized as “eakon”. Therefore, the probability ofrecognition as an unnecessary word, that is, the probability of nomisrecognition as a registered word, is 0%. To check the recognitionrate for target words, that is, the registered words “terebi”, “bideo”,“eakon”, “shoumei”, and “oodeo”, each word is input by speech ten times,and the correct answer rate is 98% for all these words.

COMPARATIVE EXAMPLE 2

Described below is the second comparative example according to thepresent invention.

In this comparative example, as in the first comparison, as shown inFIG. 11, the vocabulary network 20 configured by the registered wordlist described in the table 1, that is, the vocabulary network 22 ofregistered words without using a virtual model for recognition of anunnecessary word is read to the speech recognizing process routine instep S3 shown in FIG. 3 to prepared the speech recognition remotecontroller. Then, as unnecessary words, “a, i, u, e, o”, “eeto”,“keibi”, “ehen”, “shouchi” and “oodekoron” are input by speech ten timesfor each word. As a result, “a, i, u, e, o” is easily misrecognized as“bideo”, “eeto” is easily misrecognized as “eakon”, “keibi” is easilymisrecognized as “terebi”, “ehen” is easily misrecognized as “eakon”,“shouchi” is easily misrecognized as “shoumei”, “oodekoron” is easilymisrecognized as “oodeo”. Therefore, the probability of recognition asan unnecessary word, that is, the probability of no misrecognition as aregistered word, is 0%.

In the present embodiment, the speech instruction information memory 7corresponds to the storage means, the microphone 3 corresponds to themeans for inputting speech uttered from a user, the speech instructionrecognition circuit 6 corresponds to the speech recognition means, andthe infrared emitting unit 2 corresponds to the transmission means.

The second embodiment of the present invention is explained below byreferring to the attached drawings. In this embodiment, the speechrecognizing process in the first embodiment is performed by recognizingthe registered word contained in the speech of a user, and applying theinformation terminal for controlling the electronic mail transmittingand receiving function, the schedule managing function, the speech memoprocessing function, the speech timer function, etc. The speech memoprocessing function is the function of allowing a user to input byspeech the contents of a memo, recording the speech, and recognizing thespeech at a request of the user. The speech timer function is thefunction of allowing a user to input by speech the contents of a notice,recording the speech, inputting a notice timing, and recognizing thespeech with the notice timing.

FIG. 12 is a primary block diagram of the information terminal byapplying an analog telephone according to the second embodiment of thepresent invention. The information terminal shown in FIG. 12 comprises aspeech recognition unit 51 for recognizing the registered word containedin the speech of the user, and performing the electronic mailtransmitting and receiving function, the schedule managing function, thespeech memo processing function, the speech timer function, etc. and acommunications unit 52 for connection to a communications line, etc.based on the recognition result. The speech of the user is input from amicrophone 53 of the speech recognition unit 51, passes through anamplifier 54, and is converted into a digitized acoustic parameter by anA/D converter 55. A speech instruction recognition circuit 56 calculatesthe likelihood of a registered word in the registered vocabulary liststored and registered in speech instruction information memory 57 forthe digitized acoustic parameter in a speech unit, and what is relatedto the largest accumulation value of the likelihood is extracted as theclosest to the speech of the user. The speech instruction recognitioncircuit 56 simultaneously calculates the likelihood of an unnecessaryword stored and registered in the speech instruction information memory57 for a digitized acoustic parameter. When the likelihood of theunnecessary word is larger than the likelihood of the registered word,it is assumed that no registered word has been extracted from thedigitized acoustic parameter.

The speech instruction information memory 57 stores as registeredvocabulary lists an electronic mail transmitting vocabulary list storinga registered word relating to the electronic mail transmitting function,an electronic mail receiving vocabulary list storing a registered wordrelating to the electronic mail receiving function, a schedulemanagement vocabulary list storing a registered word relating to theschedule managing function, a speech memo vocabulary list storing aregistered word relating to the speech memo processing function, aspeech time vocabulary list storing a registered word relating to thespeech timer function, and control codes corresponding to a mailtransmit command and a mail receive command which are registered words.If an electronic mail transmission starting password is extracted, thatis, obtained as a recognition result, in the speech instructionrecognition circuit 56, then the arithmetic process described later isperformed to control the electronic mail transmitting function based onthe speech of the user, the user is allowed to input by speech thecontents of the mail, the speech is detected by the microphone 53,stored as speech data in RAM 69 through a microphone interface circuit68. When an electronic mail transmit command is input, the control codefor control of a telephone corresponding to the command is called fromthe speech instruction information memory 57, and is transmitted to thecommunications unit 52, and the speech data is attached to theelectronic mail and is transmitted. Similarly, when the speechinstruction recognition circuit 56 obtains an electronic mail receptionstarting password as a recognition result, the arithmetic processdescribed later for controlling the electronic mail receiving functionis performed depending on the speech of the user. When an electronicmail receive command is input, the control code for control of atelephone corresponding to the command is called from the speechinstruction information memory 57, and is transmitted to thecommunications unit 52, thereby receiving electronic mail to whichspeech data is attached, and recognizing the speech data by a speaker 67through a D/A converter 65 and the amplifier 16. The control code is notspecifically designated so far as it can control the communications unit52. However, since an AT command is commonly used, an AT command is alsoadopted in the present embodiment.

When the speech instruction recognition circuit 56 obtains a startingpassword of the schedule managing function as a recognition result, acentral control circuit 58 performs the arithmetic process describedlater for controlling the schedule managing function depending on thespeech of the user, the user is allowed to input by speech the contentsof the schedule, the speech is detected by the microphone 53 and isstored as speech data in the RAM 69 through the microphone interfacecircuit 68, the execution day of the schedule is input, and theexecution day is associated with the speech data, thereby performing theschedule. When a starting password for the speech memo processingfunction is extracted, that is, obtained as a recognition result, in thespeech instruction recognition circuit 56, the arithmetic processdescribed later for controlling the speech memo processing functiondepending on the speech of the user is performed in the central controlcircuit 58, the user is allowed to input by speech the contents of thememo, the speech is detected by the microphone 53 and stored as speechdata in the RAM 69 through the microphone interface circuit 68, thespeech data is called from the RAM 69 at a request of the user, and isregenerated by the speaker 67 through the D/A converter 65 and theamplifier 16. Furthermore, when a starting password for the speech timergenerating function is obtained as a recognition result in the speechinstruction recognition circuit 56, the arithmetic process describedlater for controlling the speech timer function depending on the speechof the user in the central control circuit 58, the user is allowed toinput the contents of a notice, the speech is detected by a microphoneand is stored as speech data in the RAM 69 through the microphoneinterface circuit 68, the notice timing of the speech is input, thespeech data is called from the RAM 69 with the notice timing, and isregenerated by the speaker 67 through the D/A converter 65 and theamplifier 16.

Available hardware is not specifically designated so far as the basicfunction according to FIG. 12 is included. In the description below, acommonly marketed personal computer is explained as shown in FIG. 13when it is used as the speech recognition unit 51.

FIG. 14 shows the process performed by the information terminal shown inFIG. 13 in the flowchart of the arithmetic process of transmittingelectronic mail depending on the speech of a user. Although no step forcommunications is provided in the flowchart, the information obtained inthe arithmetic process is updated and stored in the storage device atany time, and necessary information is read at any time from the storagedevice.

When the arithmetic process is performed, first in step S101, the speechdetected in the microphone 53 is read, and the speech recognizingprocess of recognizing whether the starting password (for example, theword “electronic mail transmission”) which is the registered wordcontained in the speech is contained or the noise and speech other thanthe starting password, that is, unnecessary words only, are contained.If the starting password is contained (YES), control is passed to stepS102. Otherwise (NO), the process flow is repeated.

Instep S102, the electronic mail transmitting vocabulary list is read asa registered vocabulary list, and a speech mail launcher is activated asshown in FIG. 15 so that a user can display on an LCD display device 62a list of registered words with which the user can issue an instruction.A registered word for display on the LCD display device 62 can be, forexample, a mail generate command (for example, “generate mail”) to beuttered when mail is to be generated.

In step S103, the speech detected by the microphone 53 is read, thespeech recognizing process of recognizing whether a mail generatecommand is contained in the speech, or only noise and speech other thanthe mail generate command, that is, an unnecessary word, is contained isperformed. If the speech contains a mail generate command (YES), controlis passed to step S104. Otherwise (NO), the process flow is contained.

Then, in step S104, the speech detected in the microphone 53 is read,and the speech recognizing process of recognizing whether thedestination list select command (for example, a word “destination list”)which is a registered word to be contained in the speech is contained,or only the noise and speech other than the destination list selectcommand, that is, the unnecessary words, are contained is performed. Ifthe destination list select command is contained in the speech (YES),then control is passed to step S105. Otherwise (NO), control is passedto step S106.

In step S105, as shown in FIG. 15, a list of the names of the personswhose mail addresses are registered, that is, the names of the personswhose mail addresses are stored in a predetermined data area of astorage device, is displayed on the LCD display device 62, the speechdetected by the microphone 53 is read, and the speech recognizingprocess of recognizing the names of the persons which are the registeredwords contained in the speech is performed, the mail addresscorresponding to the name of the person is called, and control is passedto step S107.

In step S106, a message requesting to utter the mail address of a maildestination is displayed on the LCD display device 62, the speechdetected by the microphone 53 is read, the speech recognizing process ofrecognizing alphabetical characters which indicate the registered wordcontained in the speech is performed, and the mail address of thedestination is recognized, thereby passing control to step S107.

In step S107, the speech recognizing process of recognizing a recordstart command (for example, “start recording”) which is a registeredword is performed on the speech detected by the microphone 53, and it isdetermined whether or not the speech contains the record start command.if the record start command is contained (YES), control is passed tostep S108. Otherwise (NO), the process flow is repeated.

In step S108, a message requesting to utter the contents of mail isdisplayed on the LCD display device 62, speech data is generated byrecording the speech data detected by the microphone 53 for apredetermined time, and the speech data is stored in a predetermineddata area of the storage device as the contents of mail.

In step S109, the speech recognizing process of recognizing anadditional record command (for example, “additional recording”) which isa registered word is performed on the speech detected by the microphone53, and it is determined whether or not the speech contains theadditional record command. If the additional record command is contained(YES), control is passed to step S108. Otherwise (NO), control is passedto step S110.

In step S110, the speech detected by the microphone 53 is read, and itis determined whether or not the speech contains a record contentsconfirm command (for example, “confirm record contents”). If the speechcontains the record contents confirm command (YES), control is passed tostep S111. Otherwise (NO), control is passed to step S112.

In step S111, the speech data generated in step S108, that is, thecontents of the mail, is read from a predetermined data area in thestorage device, the speech data is regenerated by the speaker 67, andcontrol is passed to step S112.

In step S112, the speech detected by the microphone 53 is read, and itis determined whether or not the speech contains a transmit command (forexample, “confirm transmission”). If the transmit command is contained(YES), control is passed to step S113. Otherwise (NO), control is passedto step S114.

In step S113, an AT command for calling up a provider is read from apredetermined data area of the storage device, and the AT command istransmitted to a speech communications unit 102 for connection to themail server of the provider.

Then, control is passed to step S114, the speech data generated in stepS108, that is, the contents of mail, is read from a predetermined dataarea of the storage device, the speech data is attached to electronicmail, and the electronic mail is transmitted to the mail address read instep S105 or the mail address which is input in step S106.

Then in step S115, an AT command specifying a disconnection of a circuitis called from a predetermined data area of the storage device, and theAT command is transmitted to the communications unit 52.

In step S116, a message notifying that the transmission of theelectronic mail has been completed is displayed on the LCD displaydevice 62, and then control is passed to step S118.

In step S117, the speech data generated in step S108, that is, thecontents of mail, is deleted from a predetermined data area of thestorage device, and control is passed to step S118.

In step S118, the speech recognizing process of recognizing a terminatecommand (for example, “terminate”) which is a registered word isperformed on the speech detected by the microphone 53, and it isdetermined whether or not the speech contains the terminate command. Ifthe terminate command is contained (YES), the arithmetic process isterminated. Otherwise (NO), control is passed to step S104.

FIG. 16 shows the process performed by the information terminal shown inFIG. 13, and is a flowchart of the arithmetic process for receiving,etc. electronic mail according to the speech of the user. In thisflowchart, there is no step for communications. However, the informationobtained in the arithmetic process is updated and stored in the storagedevice, and necessary information is read from the storage device. Whenthe arithmetic process is performed, first in step S201, the speechdetected by the microphone 53 is read, and the speech recognizingprocess of recognizing whether the speech contains a starting password(for example, “receive electronic mail”) or noise or speech other thanthe starting password, that is, only unnecessary words is performed. Ifthe starting password is contained (YES), control is passed to stepS202. Otherwise (NO), the process flow is repeated.

Then, in step S202, an electronic mail receiving vocabulary list is readas a registered vocabulary list, and a speech mail launcher isactivated, and a list of registered words with which a user can issue aninstruction is displayed on the LCD display device 62. A registered wordto be displayed on the LCD display device 62 can be, for example, a mailreceive command (for example, “receive mail”), etc. uttered when mail isto be received.

Then, in step S203, the speech detected by the microphone 53 is read,and it is determined whether or not the speech contains a mail receivecommand. If the mail receive command is contained (YES), control ispassed to step S204. Otherwise (NO), the process flow is repeated.

Then, in step S204, an AT command for a call to a provider is calledfrom a predetermined data area of the storage device, and the AT commandis transmitted to the speech communications unit 102 for connection tothe mail server of the provider.

Then, in step S205, electronic mail is received from the mail serverconnected in step S204, and the electronic mail is stored in apredetermined data area of the storage device.

Then, control is passed to step S206, and a message notifying that theelectronic mail has been completely received is displayed on the LCDdisplay device 62.

Then, in step S207, the AT command indicating the disconnection of aline is called from a predetermined data area of the storage device, andthe AT command is transmitted to the communications unit 52.

In step S208, a list of mail received in step S205 is displayed on theLCD display device 62, the speech detected by the microphone 53 is read,the speech recognizing process of recognizing a mail select commandwhich is a registered word contained in the speech is performed, and auser is allowed to select specific mail from a list of mail. A mailselect command can be anything so far as a user is allowed to select aspecific mail. For example, when the name of a mail transmitter isdisplayed in a mail list, the listed name can be used.

Then, in step S209, the speech recognizing process of recognizing aregenerate command (for example, “regenerate”) which is a registeredword is performed on the speech detected by the microphone 53, and it isdetermined whether or not the application contains a regenerate command.If a regenerate command is contained (YES), then control is passed tostep S210. Otherwise (NO), control is passed to step S211.

In step S210, the speech data attached to the mail selected in stepS208, that is, the contents of mail, is read from a predetermined dataarea of the storage device, and the speech data is regenerated by thespeaker 67, thereby passing control to step S211.

In step S211, the speech recognizing process of recognizing a scheduleregister command (for example, “register schedule”) which is aregistered word is performed on the speech detected by the microphone53, and it is determined whether or not the speech contains the scheduleregister command. If a schedule register command is contained (YES),then control is passed to step S212. Otherwise (NO), control is passedto step S217.

In step S212, a schedule management vocabulary list is read as aregistered vocabulary list, a scheduler is activated, and a list ofregistered words with which the user can issue an instruction isdisplayed on the LCD display device 62.

Then, in step S213, it is determined whether or not header information(for example, information designating a date, etc.) is described in themail selected in step S208. If header information is described (YES),then control is passed to step S214. Otherwise (NO), control is passedto step S215.

In step S214, the speech data attached to the mail selected in stepS208, that is, the contents of mail, is stored in a predetermined dataarea of the storage device as the contents of a schedule of the date ofthe header information described in the mail. Then, a message requestingto input a select large/small item command (for example, “private”,“meet”, etc.) of the contents of a schedule is displayed on the LCDdisplay device 62, the speech detected by the microphone 53 is read, andthe speech recognizing process of recognizing a select large/small itemcommand of the contents of a schedule which is a registered wordcontained in the speech is performed. The recognition result is storedin a predetermined data area of the storage device using the recognitionresult as the speech data, that is, a large/small item of the schedulecontents, and then control is passed to step S217.

On the other hand, in step S215, a message requesting input of theexecution day of a schedule is displayed on the LCD display device 62,the speech detected by the microphone 53 is read, and the speechrecognizing process of recognizing a year-month-day input command (forexample, “date”) which is a registered word contained in the speech isperformed.

Then, in step S216, the speech data attached to the mail selected instep S208 is stored in a predetermined data area of the storage deviceas the contents of the schedule on the date recognized in step S215.Then, the message requesting to input a select large/small item command(for example, “private”, “meet”, etc.) of the schedule contents isdisplayed on the LCD display device 62, the speech detected by themicrophone 53 is read, and the speech recognizing process of recognizingthe select large/small item command of the schedule contents which isregistered words contained in the speech is performed. Then, therecognition result is stored in a predetermined data area of the storagedevice as the speech data, that is, a large/small item of the schedulecontents, thereby passing control to step S217.

In step S217, the speech recognizing process of recognizing a terminatecommand (for example, “terminate”) which is a registered word isperformed on the speech detected by the microphone 53, and it isdetermined whether or not the speech contains the terminate command. Ifthe terminate command is contained (YES), the arithmetic process isterminated. Otherwise (NO), control is passed to step S203.

FIG. 17 shows the process performed by the information terminal shown inFIG. 13, and is a flowchart of the arithmetic process for performing theschedule managing function according to the speech of the user. In thisflowchart, there is no step for communications. However, the informationobtained in the arithmetic process is updated and stored in the storagedevice, and necessary information is read from the storage device. Whenthe arithmetic process is performed, first in step S301, the speechdetected by the microphone 3 is read, and the speech recognizing processof recognizing whether the speech contains a starting password (forexample, “speech schedule”) or noise or speech other than the startingpassword, that is, only unnecessary words is performed. If the startingpassword is contained (YES), control is passed to step S302. Otherwise(NO), the process flow is repeated.

Then, instep S302, a schedule management vocabulary list is read as aregistered vocabulary list, the speech schedule launcher is activated asshown in FIG. 18, and a list of registered words with which a user canissue an instruction can be displayed on the LCD display device 62. Aregistered word displayed on the LCD display device 62 can be, forexample, a schedule register command (for example, “set schedule”) to beuttered when a schedule is registered, and a schedule confirm command(for example, confirm schedule) to be uttered when a schedule isconfirmed.

Then, in step S303, a message requesting to utter the execution day of aschedule is displayed on the LCD display device 62, the speech detectedby the microphone 53 is read, and the speech recognizing process ofrecognizing a year-month-day input command (for example, “date”) whichis a registered word contained in the speech is performed.

Then, control is passed to step S304, and the speech recognizing processof recognizing a schedule register command which is a registered word isperformed on the speech detected by the microphone 53, and it isdetermined whether or not the speech contains a schedule registercommand. If a schedule register command is contained (YES), then controlis passed to step S305. Otherwise (NO), control is passed to step S310.

In step S305, the speech detected by the microphone 53 is read, thespeech recognizing process of recognizing a schedule start/stop timeinput command (for example, “time”) which is a registered word containedin the speech is performed, and a user is requested to input the startand stop time of the schedule.

Then, in step S306, a message requesting to utter the contents of aschedule is displayed on the LCD display device 62, the speech detectedby the microphone 53 is recorded for a predetermined time and speechdata is generated, and the data in stored in a predetermined data areaof the storage device as the contents of the schedule on the daterecognized in step S303.

Then, in step S307, a message requesting to input a select large/smallitem command (for example, “private”, “meet”, etc.) of the contents of aschedule is displayed on the LCD display device 62, the speech detectedby the microphone 53 is read, and the speech recognizing process ofrecognizing a select large/small item command of the contents of aschedule which is a registered word contained in the speech isperformed. Then, the recognition result is stored in a predetermineddata area of the storage device as the speech data generated in stepS306, that is, a large/small item of the contents of the schedule.

In step S308, a message requesting to utter a set command of a reminderfunction (for example, “set reminder”) is displayed on the LCD displaydevice 62, and the speech recognizing process of recognizing a reminderset command which is a registered word is performed on the speechdetected by the microphone 53 is performed. Then, it is determinedwhether or not the speech contains the reminder set command. If thereminder set command is contained (YES), then control is passed to stepS309. Otherwise (NO), control is passed to step S324. The reminderfunction refers to the function of announcing the contents of a schedulewith a predetermined timing, and reminds the user of the presence of theschedule.

In step S309, a message requesting to input the name of a destinationand the notice time of the reminder, etc. is displayed on the LCDdisplay device 62, the speech detected by the microphone 53 is read, andthe speech recognizing process of recognizing the notice time of thereminder which is the registered word contained in the speech the setcommand (for example, “number of minutes before a predetermined time”)of the name of the destination is performed, and the user is allowed toinput the notice timing, etc. by the reminder function. At the nextnotice time of the reminder, the speech data generated in step S306,that is, the schedule contents, is read from a predetermined data area,the arithmetic process of regenerating the speech data using the speaker67 is performed, and control is passed to step S324.

In step S310, the speech recognizing process of recognizing a scheduleconfirm command which is a registered word is performed on the speechdetected by the microphone 53, and it is determined whether or not theschedule confirm command is contained in the speech. If a scheduleconfirm command is contained (YES), then control is passed to step S311.Otherwise (NO), control is passed to step S319.

In step S311, as shown in FIG. 19, the large/small item of the schedulecontents input in steps S214, S216, and S307 in the arithmetic processfor receiving the electronic mail is read from a predetermined data areaof the storage device, and a list of the items is displayed on the LCDdisplay device 62.

In step S312, the speech recognizing process of recognizing a recordcontents confirm command (for example, “confirm”) which is a registeredword is performed on the speech detected by the microphone 53, and it isdetermined whether or not the record contents confirm command iscontained in the speech. If a record contents confirm command iscontained (YES), then control is passed to step S313. Otherwise (NO),control is passed to step S314.

In step S313, the speech data corresponding to the large/small itemlisted on the LCD display device 62 in step S311, that is, the schedulecontents, are regenerated by the speaker 67, and control is passed tostep S314.

In step S314, the speech recognizing process of recognizing a scheduleadd/register command (for example, “set schedule”) which is a registeredword is performed on the speech detected by the microphone 53, and it isdetermined whether or not the schedule add/register command is containedin the speech. Ifa schedule add/register command is contained (YES),then control is passed to step S315. Otherwise (NO), control is passedto step S316.

In step S315, a data area for registration of a new schedule is reservedin the storage device, and then control is passed to step S305.

On the other hand, in step S316, the speech recognizing process ofrecognizing a schedule amend command (for example, “amend”) which is aregistered word is performed on the speech detected by the microphone53, and it is determined whether or not the schedule amend command iscontained in the speech. If a schedule amend command is contained (YES),then control is passed to step S305. Otherwise (NO), control is passedto step S317.

In step S317, the speech recognizing process of recognizing a scheduledelete command (for example, “delete”) which is a registered word isperformed on the speech detected by the microphone 53, and it isdetermined whether or not the schedule delete command is contained inthe speech. If a schedule delete command is contained (YES), thencontrol is passed to step S318. Otherwise (NO), control is passed tostep S311.

In step S318, the data area in which a schedule is registered is deletedfrom the storage device, and then control is passed to step S324.

In step S319, the speech recognizing process of recognizing a scheduleretrieve command (for example, “schedule retrieval”) which is aregistered word is performed on the speech detected by the microphone53, and it is determined whether or not the schedule retrieve command iscontained in the speech. If a schedule retrieve command is contained(YES), then control is passed to step S320. Otherwise (NO), control ispassed to step S303.

In step S320, the message requesting to utter a select large/small itemcommand of the schedule contents is displayed on the LCD display device62, and the speech detected by the microphone 53 is read, the speechrecognizing process of recognizing the select large/small item commandof the schedule contents contained in the speech is performed, and theuser is allowed to input a large/small item of the schedule contents tobe retrieved.

Then, in step S321, the speech recognizing process of recognizing aretrieval execute command (for example, “execute retrieval”) which is aregistered word is performed on the speech detected by the microphone53, and it is determined whether or not the retrieval execute command iscontained in the speech. If a retrieval execute command is contained(YES), then control is passed to step S322. Otherwise (NO), control ispassed to step S320.

In step S322, the schedule corresponding to the large/small item of theschedule contents recognized in step S320 is retrieved from apredetermined data area of the storage device, and a retrieval result isdisplayed on the LCD display device 62.

In step S323, the speech recognizing process of recognizing are-retrieve command (for example, “re-retrieval”) which is a registeredword is performed on the speech detected by the microphone 53, and it isdetermined whether or not the re-retrieve command is contained in thespeech. If a re-retrieve command is contained (YES), then control ispassed to step S324. Otherwise (NO), control is passed to step S320.

In step S324, the speech recognizing process of recognizing a terminatecommand (for example, “terminate”) which is a registered word isperformed on the speech detected by the microphone 53, and it isdetermined whether or not the terminate command is contained in thespeech. If a terminate command is contained (YES), then the processterminates. Otherwise (NO), control is passed to step S303.

FIG. 20 shows the process performed by the information terminal shown inFIG. 13, and is a flowchart of the arithmetic process of performing thespeech memo function depending on the speech of a user. In thisflowchart, no steps are provided for communications. However, theinformation obtained in the arithmetic process is updated and stored inthe storage device at anytime, and necessary information is read fromthe storage device. When the arithmetic process is performed, first instep S401, the speech detected by the microphone 53 is read, and thespeech recognizing process of recognizing whether a starting password(for example, “speech memo”) which is a registered word contained in thespeech is contained, or noise or speech other than a starting password,that is, only unnecessary words are contained is performed. If astarting password is contained (YES), then control is passed to stepS402. Otherwise (NO), the process flow is repeated.

Then, in step S402, a speech memo vocabulary list is read as aregistered vocabulary list, and the speech memo launcher is activated asshown in FIG. 21, and a list of registered words with which a user canissue an instruction is displayed on the LCD display device 12. Theregistered words to be displayed on the LCD display device 62 can be: arecord command (for example, “start record”) to be uttered when speechis to be recorded; a regenerate command (for example, “startregeneration”) to be uttered when a speech memo is to be regenerated; amemo folder number select command, the number associated with eachspeech memo, (for example, “first”, “second”, etc.), etc. to be utteredwhen a speech memo is to be selected.

In step S403, the speech recognizing process of recognizing a memofolder number select command which is a registered word is performed onthe speech detected by the microphone 53, and it is determined whetheror not the memo folder number select command is contained in the speech.If a memo folder number select command is contained (YES), then controlis passed to step S404. Otherwise (NO), control is passed to step S407.

In step S404, the speech recognizing process of recognizing a recordcommand which is a registered word is performed on the speech detectedby the microphone 53, and it is determined whether or not the recordcommand is contained in the speech. If a record command is contained(YES), then control is passed to step S405. Otherwise (NO), control ispassed to step S403.

In step S405, a message requesting to utter the memo contents isdisplayed on the LCD display device 62, speech data is generated byrecording speech detected by the microphone 53 for a predetermined time,and the speech data is stored in a predetermined data area in thestorage device as memo contents corresponding to the memo folderselected in step S403.

In step S406, the speech recognizing process of recognizing a recordcontents confirm command (for example, “confirm”) which is a registeredword is performed on the speech detected by the microphone 53, and it isdetermined whether or not the record contents confirm command iscontained in the speech. If a record contents confirm command iscontained (YES), then control is passed to step S408. Otherwise (NO),control is passed to step S409.

In step S407, the speech recognizing process of recognizing a regeneratecommand which is a registered word is performed on the speech detectedby the microphone 53, and it is determined whether or not the regeneratecommand is contained in the speech. If a regenerate command is contained(YES), then control is passed to step S408. Otherwise (NO), the processflow is repeated.

In step S408, the speech data corresponding to the memo folder selectedin step S403, that is, the memo contents, is read from a predetermineddata area of the storage device, and the speech data is regenerated bythe speaker 67, and control is passed to step S409.

In step S409, the speech recognizing process of recognizing a terminatecommand (for example, “terminate”) which is a registered word isperformed on the speech detected by the microphone 53, and it isdetermined whether or not the terminate command is contained in thespeech. If a terminate command is contained (YES), then the processterminates. Otherwise (NO), control is passed to step S403.

FIG. 22 shows the process performed by the information terminal shown inFIG. 13, and is a flowchart of the arithmetic process of performing thespeech timer function depending on the speech of a user. In thisflowchart, no steps are provided for communications. However, theinformation obtained in the arithmetic process is updated and stored inthe storage device at anytime, and necessary information is read fromthe storage device. When the arithmetic process is performed, first instep S501, the speech detected by the microphone 53 is read, and thespeech recognizing process of recognizing whether a starting password(for example, “speech timer”) which is a registered word contained inthe speech is contained, or noise or speech other than a startingpassword, that is, only unnecessary words are contained is performed. Ifa starting password is contained (YES), then control is passed to stepS502. Otherwise (NO), the process flow is repeated.

Then, in step S502, a speech timer vocabulary list is read as aregistered vocabulary list, and the speech timer launcher is activated,and a list of registered words with which a user can issue aninstruction is displayed on the LCD display device 12. The registeredwords to be displayed on the LCD display device 62 can be: a timer setcommand (for example, “set timer”) to be uttered when notice contentsand notice timing are set, a timer start command (for example, “starttimer”) to be uttered when a timer is operated, etc.

In step S503, the speech recognizing process of recognizing a timer setcommand which is a registered word is performed on the speech detectedby the microphone 53, and it is determined whether or not the timer setcommand is contained in the speech. If a timer set command is contained(YES), then control is passed to step S504. Otherwise (NO), control ispassed to step S502.

In step S504, a message requesting to input the time from the start ofthe operation of the timer to the notice, that is, the notice timing, isdisplayed on the LCD display device 62, the speech detected by themicrophone 53 is read, and the speech recognizing process of recognizingthe timer time set command (for example, “minutes”) which is aregistered word is performed.

Then, in step S505, a message requesting to return an answer as towhether or not the notice contents are to be recorded is displayed onthe LCD display device 62, the speech recognizing process of recognizinga record start confirm command (for example, “Yes”) which is aregistered word is performed on the speech detected by the microphone53, and it is determined whether or not the record start confirm commandis contained in the speech. If a record start confirm command iscontained (YES), then control is passed to step S506. Otherwise (NO),control is passed to step S502.

In step S506, the message requesting to utter the notice contents isdisplayed on the LCD display device 62, the speech data is generated byrecording the speech detected by the microphone 53 for a predeterminedtime, and the speech data is stored in a data area of the storage deviceas notice contents to be announced at a time recognized in step S504,that is, with a notice timing.

Then, in step S507, the speech data recorded in step S506, that is, themessage requesting to confirm the notice contents, is displayed on theLCD display device 62, the speech recognizing process of receiving aconfirm command of the record contents which is a registered word isperformed on the speech detected by the microphone 53, it is determinedwhether or not the speech contains the confirm command of the recordcontents. If the confirm command of the record contents is contained(YES), then control is passed to step S508. Otherwise (NO), control ispassed to step S509.

In step S508, the speech data generated in step S506, that is, thenotice contents, is regenerated by the speaker 67, and then control ispassed to step S509.

In step S509, the speech recognizing process of recognizing a terminatecommand (for example, “terminate”) which is a registered word isperformed on the speech detected by the microphone 53, and it isdetermined whether or not the terminate command is contained in thespeech. If a terminate command is contained (YES), then the arithmeticprocess terminates. Otherwise (NO), control is passed to step S502.

In step S510, the speech recognizing process of recognizing a timerstart command which is a registered word is performed on the speechdetected by the microphone 53, and it is determined whether or not thetimer start command is contained in the speech. If a timer start commandis contained (YES), then control is passed to step S511. Otherwise (NO),control is passed to step S502.

In step S511, the speech data generated in step S506, that is, thenotice contents, are read from a predetermined data area of the storagedevice at a time recognized in step S504, that is, with a notice timing,the arithmetic process of regenerating the speech data by the speaker 67is performed, and the arithmetic process is terminated.

As explained above, since the information communications terminalaccording to the present embodiment performs the electronic mailtransmitting and receiving function, the schedule managing function, thespeech memo processing function, and the speech timer function byrecognizing the registered word contained in the speech of a user, theuser can use each function only by uttering the registered word withoutphysical operations.

Furthermore, since the speech recognizing process similar to the processin the above-mentioned first embodiment is performed, as in the firstembodiment, when speech containing no registered words, that is, speechother than the registered words, are uttered by a user, the likelihoodof the virtual model 23 is calculated large for the acoustic parameterseries of the speech, and the likelihood of the vocabulary network 22 ofregistered words is calculated small. Based on the likelihoods, thespeech other than the registered word is recognized as an unnecessaryword, and the speech other than the registered word is prevented frombeing misrecognized as a registered word, thereby avoiding a malfunctionof the information terminal.

According to the present invention, the microphone 53 corresponds to thespeech detection means, the speech instruction recognition circuit 56corresponds to the speech recognition means, and the central controlcircuit 58 corresponds to the control means.

The third embodiment of the present invention is described below byreferring to the attached drawings. In this embodiment, the speechrecognizing process similar to the process in the first embodiment isapplied to the telephone communication terminal for connection to acommunications circuit by recognizing the registered word contained inthe speech of a user. FIG. 23 is a primary block diagram of thetelephone communication terminal using an analog telephone or a voicemodem according to the third embodiment of the present invention. Thetelephone communication terminal shown in FIG. 23 comprises a speechrecognition unit 101 for controlling speech recognition; a speechcommunications unit 102 for controlling speech communications, that is,the speech recognition unit 101 for recognizing a registered wordcontained in the speech of a user, and a speech communications unit 102for connection to a communications circuit based on the recognitionresult. The speech of a user is input from a microphone 103 of thespeech recognition unit 101, transmitted through an amplifier 104, andconverted by an A/D converter 105 into a digitized acoustic parameter.The input analog speech is not designated, but is normally sampled anddigitized at a specific frequency in the range from 8 KHz to 16 KHz. Thelikelihood of the digitized acoustic parameter is calculated relative tothe acoustic parameter for each speech unit which is a configurationunit of each word for the registered vocabulary list stored andregistered in speech instruction information memory 107 in a speechinstruction recognition circuit 106, thereby extracting the most likelyword from the registered vocabulary list. That is, in the speechinstruction recognition circuit 106, the likelihood of a word(hereinafter referred to as a registered word) in the registeredvocabulary list stored and registered in the speech instructioninformation memory 107 for the digitized acoustic parameter iscalculated for each configuration unit (hereinafter referred to as aspeech unit), and the largest accumulation value of the likelihood isextracted as the registered word closest to the speech of the user. Inthe speech instruction recognition circuit 106, the likelihood of theunnecessary word model stored and registered in the speech instructioninformation memory 107 is simultaneously calculated for the digitizedacoustic parameter. When the likelihood of the unnecessary word model ishigher than the likelihood of the registered word, it is assumed that noregistered word has been extracted from the digitized acousticparameter.

In the registered vocabulary list, registered words and unnecessarywords other than the registered words are registered. A speech unit canbe a syllable, a phoneme, a semisyllable, a diphone (two pairs ofphoneme), a triphone (three pairs of phoneme), etc.

In the speech instruction information memory 107, a name vocabulary liststoring names and the phone numbers corresponding to the names, a numbervocabulary list for recognition of serial numbers depending on thenumber of digits corresponding to an arbitrary phone number, a telephonecall operation vocabulary list relating to the telephone operation, acall receiving operation vocabulary list relating to the response whenan incoming call is received, and a control code corresponding to eachregistered word are stored as registered vocabulary lists. For example,when the speech instruction recognition circuit 106 extracts aregistered word relating to the telephone operation, that is, arecognition result is obtained, the control code for the telephoneoperation corresponding to the speech recognized registered word iscalled from the speech instruction information memory 107, andtransmitted from a central control circuit 108 to the speechcommunications unit 102. The control code is not specified so far as itis used in control the speech communications unit 102. However, since anAT command is generally used, the AT command is adopted as arepresentative example in the present embodiment.

In a phone call operation, when a name of a person or phone numberinformation is input by speech from the microphone 103, a registeredword contained in the speech is recognized, the speech recognitionresult is displayed on the LCD display unit 109 for visual notice,called from a response speech information memory 118 by a responsespeech control circuit 110, and is aurally announced as an analog signalfrom a speaker 113. When the recognition result is correct, and when auser input a speech command such as “make a call”, etc. from themicrophone 103, the central control circuit 108 converts issue controlto a destination phone number as an AT command and transmits it to aone-chip microcomputer 114 of the speech communications unit 102.

When a telephone line is connected and the schedule contents is enabled,speech communications are performed using a microphone 115 and a speaker116 of the speech communications unit 102, and the volume level of themicrophone 103 and the speaker 113 of the speech recognition unit 101can be adjusted independent of the microphone 115 and the speaker 116 ofthe speech communications unit 102.

In the speech recognition unit 101, when the control code for control oftelephone is transmitted from the central control circuit 108 to thespeech communications unit 102 through an external interface 117, theon-hook status, the off-hook status, or the line communications statusof the speech communications unit 102 can be checked by receiving astatus signal from the speech communications unit 102, and themisrecognition due to an unnecessary word can be reduced by sequentiallychanging necessary registered vocabulary lists for the subsequentoperations depending on the status. For example, when an incoming callis received, ringing information for announcement of a call received atthe speech communications unit 102 is transmitted to the speechrecognition unit 101, thereby calling a call receiving operationvocabulary list relating to a response to an incoming call, and adetermination as to whether or not a user answer the call by speech isinput using the microphone 103 of the speech recognition unit 101, andtelephone communications can be performed handsfree by speech input. Atthis time, if the destination information such as the phone number ofthe destination, etc. can be obtained, then the name and the phonenumber are compared with the name vocabulary list, the comparison resultis displayed on the LCD display unit 109 for visual notice, the responsespeech data corresponding to the comparison result is called from theresponse speech information memory 118 using the response speech controlcircuit 110, and the announcement “a call from Mr. ooo” can be aurallytransmitted from the microphone 103 through the D/A converter 111 andthe amplifier 112.

Thus, according to the present embodiment, by providing a speechinput/output system, that is, at least two systems of a microphone and aspeaker, more detailed information can be transmitted to a user by meansother than screen display concurrent with the operation of the speaker116 used in normal ringing system. In a method of transmitting detailedinformation on the screen display, operations can be smoothly performedeven in a case in which it is hard to confirm the destinationinformation about the telephone which receives an incoming call when auser is away from the body of a telephone, when the eyes cannot bechanged to the screen while driving a car, or when the user is avisually handicapped person.

FIG. 24 shows a variation of the wireless system of a mobile telephoneas connection means to a public telephone line. As compared with FIG.23, it is different in the primary block diagram of the speechcommunications unit 102. When the wireless system of a mobile phone isused, a normal input/output device for speech communications, that is,the microphone 115 and the speaker 116 of the speech communications unit102, are controlled to be powered on and off according to the speechreceiving condition of the destination. Therefore, by separatelypreparing the speech input/output device, that is, the microphone 103and the speaker 113 for speech recognition, the telephone communicationterminal having the function of speech recognition can be constantlyused regardless of the feature (operation status) of the input/outputdevice for speech communications which is operated depending on thespeech communications system. That is, although a user is communicatingwith a partner and the microphone 115 and the speaker 115 of the speechcommunications unit 102 are occupied for the communications, the usercan input speech on the speech recognition unit 101, and can control thespeech communications unit 102. In the method of inputting speech by ahand set with a dial signal automatically transmitted by speech, anoff-hook mode is required as a telephone capability to constantly acceptspeech input. In this case, the receiver is constantly off-hook, therebyrejecting an incoming call.

FIG. 25 is a flowchart of the arithmetic process of an issuingoperation, etc. performed by the central control circuit 108 by a useruttering the name of a person. That is, FIG. 25 shows the process schemerelating to a call issuing operation using the name of a person. In thisflowchart, although there is no step for communications, the informationobtained in the arithmetic process is updated and stored in the storagedevice at any time, and necessary information is read from the storagedevice. When the arithmetic process is performed, first in step S601,the initial status of the speech communications unit 102 is confirmed bydetecting the on-hook status, and the status of accepting an issue of acall. Practically, it is determined whether or not it is on-hook statusby receiving a status signal from the speech communications unit 102. Ifit is on-hook status (YES), then control is passed to step S602.Otherwise (NO), the process flow is repeated.

In step S602, the input of a name by speech from a user is received.Practically, as a registered vocabulary list, a name vocabulary liststoring the names and phone numbers is read, the speech detected by themicrophone 103 is read, and the speech instruction recognition circuit106 recognizes whether or not the speech contains the name registered inthe registered vocabulary list, or contains noise and speech other thanthe names of persons, that is, unnecessary words only. Relating to thename of a person, the speech instruction information memory 107 stores aphone number corresponding to the name as a name vocabulary list. Inputanalog speech is not specifically limited, but is normally sampled anddigitized at a specific frequency in the range from 8 KHz to 16 KHz. Thelikelihood of the digitized acoustic parameter is calculated relative tothe acoustic parameter for each speech unit which is a configurationunit of each word for the registered name vocabulary list stored andregistered in speech instruction information memory 107 in the speechinstruction recognition circuit 106, thereby extracting the most likelyword from the registered name vocabulary list. That is, in the speechinstruction recognition circuit 106, the likelihood of a name in thename registered vocabulary list and stored and registered in the speechinstruction information memory 107 for the digitized acoustic parameteris calculated for each configuration unit in the speech instructionrecognition circuit 106, and the largest accumulation value of thelikelihood is extracted as the registered name closest to the speech ofthe user. In the speech instruction recognition circuit 6, thelikelihood of the unnecessary word model stored and registered in thespeech instruction information memory 7 is simultaneously calculated forthe digitized acoustic parameter. When the likelihood of the unnecessaryword model is higher than the likelihood of the registered name, it isassumed that no registered name has been extracted from the digitizedacoustic parameter.

In step S603, it is determine whether or not it is recognized in stepS602 that the name of a person registered in the name vocabulary list iscontained in the speech. If the name of a person registered in theregistered vocabulary list is contained (YES), then control is passed tostep S604. Otherwise (NO), control is passed to step S602.

In step S604, when the name of a person is extracted in step S602, theextracted name is displayed on the terminal screen (LCD display unit109) connected to the speech communications unit 102, and the extractedname is announced by speech announcement through the response speechcontrol circuit 110.

Then, control is passed to step S605. As shown in FIG. 26, first, a wordindicating the process to be performed or a message requesting to uttera word indicating the process to be performed again is displayed on theLCD display unit 109. Then, the speech detected by the microphone 103 isread, and the speech instruction recognition circuit 106 recognizeswhether the word indicating the process to be performed which is aregistered word is contained in the speech, or whether or not the wordindicating that the process is to be performed again is contained in thespeech. Then, it is determined whether or not the speech detected by themicrophone 103 contains a word indicating the process to be performedwhich is a registered word, or a word indicating the process to beperformed again. If it contains a word indicating the process to beperformed (YES), then control is passed to step S606. Otherwise (NO),control is passed to step S602. The user determines whether or not theextracted name is a desired result. If it is a desired result, then aword indicating the process registered in advance such as “make a call”,etc. is uttered, and the speech instruction recognition circuit 106performs the process of recognizing an input speech command.

In step S606, the phone number corresponding to the name of a personextracted in step S602 is read from the name vocabulary list, the ATcommand corresponding to the phone number is called from the speechinstruction information memory 107, and the AT command is transmitted tothe speech communications unit 102. Then, as described above, if theword is recognized as a word “make a call” registered in advance, the ATcommand (ATD) for issue of a corresponding phone number is transmittedfrom the central control circuit 108 to the speech communications unit102, and the process of a line connection is performed. If the off-hookstatus of the communications partner is in response to a calling tone,the line connection is completed, and the speech communication isperformed.

On the other hand, if the extracted name is not desired, a speechcommand indicating a process to be performed again, for example, “onceagain” is uttered, and the speech input in the speech instructionrecognition circuit 106 is recognized. As described above, if a word as“once again” registered in advance is recognized, control is returned toa step (S602) of accepting the utterance of the name of a person, andthe system enters the status in which a new name of a person isaccepted.

FIG. 7 shows an example of the speech recognizing process performed bythe speech instruction recognition circuit 106. The process of thespeech recognizing process is not specifically designated. However,according to the present embodiment, as in the first embodiment, theprocess using a hidden Markov model (hereinafter referred to as an HMMfor short) is used. When the speech recognizing process is performed,first the speech detected by the microphone 103 is converted into adigitized spectrum in a Fourier transform or a wavelet transformation,and the speech data is characterized using a speech modeling method suchas a linear predication analysis, a cepstrum analysis, etc. on thespectrum. Then, for the characterized speech data, the likelihood of anacoustic model 121 of each word registered in a vocabulary network 120read in the speech recognizing process in advance is calculated usingthe Viterbi algorithm. The registered word is modeled in a serialconnection network of the HMM corresponding to a serial connection(speech unit label series) in a speech unit, and the vocabulary network120 is modeled as a serial connection network corresponding to aregistered word group registered in the registered vocabulary list. Eachregistered word is configured in a speech unit of a phoneme, etc., andthe likelihood is calculated for each speech unit. When the terminationof utterance of a user is checked, the registered word having thelargest accumulation value of likelihood is detected from the registeredvocabulary list, and the registered word is output as a registered wordrecognized as contained in the speech.

Furthermore, as in the first embodiment, the virtual model 23 forrecognition of an unnecessary word is provided parallel to thevocabulary network 120 of registered words. With the configuration, whenspeech and noise not containing a registered word, that is, anunnecessary word, is input as speech, the likelihood of the virtualmodel 23 corresponding to the unnecessary word is calculated larger thanthe likelihood of the registered word, and it is determined that anunnecessary word has been input, thereby avoiding the misrecognition ofutterance, etc. containing no registered word as a registered word.

FIG. 27 is a flowchart of the arithmetic process of an issuingoperation, etc. performed by the central control circuit 108 by a useruttering a phone number. That is, FIG. 27 shows the process schemerelating to a call issuing operation using a phone number. In thisflowchart, although there is no step for communications, the informationobtained in the arithmetic process is updated and stored in the storagedevice at anytime, and necessary information is read from the storagedevice. When the arithmetic process is performed, first in step S701,the initial status of the speech communications unit 102 is confirmed bydetecting the on-hook status, and the status of accepting an issue of acall. Practically, it is determined whether or not it is on-hook statusby receiving a status signal from the speech communications unit 102. Ifit is on-hook status (YES), then control is passed to step S702.Otherwise (NO), the process flow is repeated.

In step S702, it is determined whether or not a phone numberconfirmation mode for accepting an arbitrary phone number is entered. Ifthe mode is entered (YES), then control is passed to step S704.Otherwise (NO), then control is passed to step S703.

In step S703, the speech detected by the microphone 103 is read, thespeech instruction recognition circuit 106 recognizes that a speechcommand registered in advance for reception of a phone number which is aregistered word contained in the speech is contained. If the speechcommand is recognized, control is passed to step S704. Then, the userconfirms whether or not it is the phone number recognition mode forreception of an arbitrary phone number. If it is a name recognitionmode, etc. other than the phone number recognition mode, then a speechcommand registered in advance for reception of a phone number isuttered.

In step S704, a number vocabulary list for recognition of a series ofnumbers depending on the number of digits corresponding to an arbitraryphone number is first called as a registered vocabulary list. Next, asshown in FIG. 28, a message requesting to utter a phone number isdisplayed on the LCD display unit 109. The speech detected by themicrophone 103 is read, and the speech instruction recognition circuit106 recognizes whether or not a series of numbers which are registeredwords contained in the speech are contained. For example, “phone call bynumber” is a speech command registered for acceptance of the phonenumber. When the user utters “phone call by number”, the speechinstruction recognition circuit 106 recognizes the input speech throughthe microphone 103. If “phone call by number” is recognized, the speechinstruction recognition circuit 106 uploads a number vocabulary list forrecognition of any phone number in the memory of a speech instructionrecognition circuit, thereby entering the phone number acceptance mode.The user continuously utters a desired phone number such as“03-3356-1234” (−is not uttered) for speech recognition.

The number vocabulary list for recognition of any phone number refers tosome patterns formed by a series of character strings depending on thenations, and areas in which phones are used, the phone communicationssystem, the nation and the area of the communication partner. Forexample, when a call is made from Japan to predetermined telephonemodels, the pattern is represented by “0-inter-city number-intra-citynumber-subscriber number”, that is, a total of 10 digits (9 digits in aspecific areas) of serial numbers forming a number vocabulary list.Between the inter-city number and the intra-city number, or between theintra-city number and the subscriber number, “no” and a speech unitindicating a space can be inserted so that the redundancy of a utteringuser who utters a phone number can be amended.

When a call is made from Japan to a mobile phone or PHS in Japan, avocabulary list formed by a series of 11 digits of numbers starting with“0A0 (A indicates a single number other than 0)” is prepared. Inaddition, there also is a dedicated number vocabulary list formed by anumber strings according to a numeral strings indicated for each commoncarrier prepared by the Ministry of General Affairs. Table 2 lists aphone number patterns in Japan published by the Ministry.

As described above, according to the present invention, when a phonenumber is recognized, a user only has to continuously utter a numberstring corresponding to the entire digits of a phone number, therebyrecognizing a phone number in a short time. In the method of recognizinga phone number digit by digit, a long time is required to correctlyrecognize all digits. TABLE 2 Number pattern Class of destination Numberstarting with 00 When a call is made through a common carrier, or whenan international call is made Number starting with 0A0 When a call ismade to a mobile phone, (A is other than 0) a PHS, a pocket bell forwhich a call issuer takes charge Number starting with 0AB0 When a highquality phone service (A and B are other than 0) provided by a commoncarrier is used Phone number starting with When a call is made to acommon fixed OABC (A, B, and C are other type telephone (inter city than0) communications) (0 - inter-city number - intra-city number -subscriber number) Number starting with 1 When a call service has avalue added and is important as an emergency service, a common service,a security service, etc. Number starting with 2-9 When a call is made toa common fixed type telephone (intra-city communications)

The method of allocating each number vocabulary list to the speechinstruction recognition circuit 106 is appropriately used depending onthe recognition precision of a speech recognition engine used by thespeech instruction recognition circuit 106. One of the methods is todynamically determine a pattern of a numeral string (3 to 4 digits)recognized from the head of the numeral string when it is input byspeech by the microphone 103, and dynamically allocate the pattern to anumber vocabulary list selected when the pattern is recognized. In thismethod, for example, when a number “0 (zero)” is recognized between thefirst and third digits in the first 3-digit number string, it isconsidered in Japan to be the pattern of a phone number of a mobilephone, a PHS, etc., and a number vocabulary list for recognition of a8-digit number string (a total of 11 digits) or a specific number stringis allocated.

In another method, all number vocabulary lists are statically read tothe speech instruction recognition circuit 106, a likelihood indicatingthe adaptivity to a specific number is calculated as an average valuevariable with time from the head of the phone numbers input by users.Thus, several probable patterns are left as prospects, and otherpatterns are removed from the arithmetic operation. Finally, when theutterance section is detected, the pattern having the highest likelihoodis obtained, and the likely number is determined. In these methods, apattern is selected from among an enormous number of probable numberstrings, the recognition precision can be improved, and the load ofarithmetic operation required in recognition can be reduced, therebycontinuously recognizing the uttered numbers as a phone number.

In step S705, the phone number recognized in step S704 is displayed onthe LCD display unit 109, the recognition result is transmitted to theresponse speech control circuit 110, and the phone number is announcedto the A/D converter 105.

Then, control is passed to step S706. First, a word indicating theprocess to be performed or a message requesting to utter a wordindicating the process to be performed again is displayed on the LCDdisplay unit 109. Then, the speech detected by the microphone 103 isread, and the speech instruction recognition circuit 106 recognizeswhether the word indicating the process to be performed which is aregistered word contained is contained in the speech, or whether or notthe word indicating that the process is to be performed again iscontained in the speech. Then, it is determined whether or not thespeech detected by the microphone 103 contains a word indicating theprocess to be performed which is a registered word, or a word indicatingthe process to be performed again. If it contains a word indicating theprocess to be performed (YES in step S706′), then control is passed tostep S707. Otherwise (NO in step S706″), control is passed to step S704.

In step S707, the AT command corresponding to the phone number extractedin step S704 is called from the speech instruction information memory107, and the AT command is transmitted to the speech communications unit102.

FIG. 29 is a flowchart of the arithmetic process of an off-hookoperation, etc. performed by the central control circuit 108 by a useruttering a word indicating the termination of the communications. Thatis, FIG. 29 shows the process scheme relating to an on-hook operationfor termination of communications. In this flowchart, although there isno step for communications, the information obtained in the arithmeticprocess is updated and stored in the storage device at any time, andnecessary information is read from the storage device. When thearithmetic process is performed, first in step S801, the operationstatus of the speech communications unit 102 is confirmed as acommunications mode by detecting the off-hook status. Practically, it isdetermined whether or not it is off-hook status by receiving a statussignal from the speech communications unit 102. If it is off-hook status(YES), then control is passed to step S802. Otherwise (NO), the processflow is repeated.

In step S802, first as registered vocabulary lists, a communicationsoperation vocabulary list in which only necessary speech commandsrequired during communications and when the communications areterminated are registered in advance is read. Then, the speech detectedby the microphone 103 is read, and the speech instruction recognitioncircuit 106 recognizes whether or not the speech command indicating thetermination of the communications which is a registered word containedin the speech is contained.

Then, in step S803, an AT command indicating a line disconnection iscalled from the speech instruction information memory 107, and the ATcommand is transmitted to the speech communications unit 102. Therefore,if the speech command indicating the termination of communications, forexample, “disconnect the line” is uttered by a user, then the speechinstruction recognition circuit 106 recognizes input speech through themicrophone 103. If “disconnect the line” is recognized, the control codeindicating a line disconnection is transmitted to the speechcommunications unit 102 using the AT command (ATH) from the centralcontrol circuit 108, thereby completing the disconnection of a line.

FIG. 30 is a flowchart of the arithmetic process of an off-hookoperation, etc. performed by the central control circuit 108 by a useruttering a word indicating an incoming call. That is, FIG. 30 shows theprocess scheme relating to an off-hook operation for receiving anincoming call. In this flowchart, although there is no step forcommunications, the information obtained in the arithmetic process isupdated and stored in the storage device at any time, and necessaryinformation is read from the storage device. When the arithmetic processis performed, first in step S901, the operation status of the speechcommunications unit 102 is confirmed as a standby status by detectingthe on-hook status. Practically, it is determined whether or not it ison-hook status by receiving a status signal from the speechcommunications unit 102. If it is on-hook status (YES), then control ispassed to step S902. Otherwise (NO), the process flow is repeated.

In step S 902, it is determined whether or not a result code indicatingan incoming call has been received from the speech communications unit102. If the result code has been received (YES), a message announcingthat a call reception signal has been received is displayed on the LCDdisplay unit 109, and the message is transmitted to the response speechcontrol circuit 110, the message is announced by the A/D converter 105,then control is passed to step S903. Otherwise (NO), the process flow isrepeated. That is, if the speech communications unit 102 receives asignal announcing the reception of an incoming call, it transmits to thecentral control circuit of the speech recognition unit the result codeindicating the reception of the incoming call. Upon receipt of theincoming call signal, the speech recognition unit displays on the LCDdisplay unit 109 the contents announcing the reception of the incomingcall signal, and simultaneously allows the speaker 1 to announce thereception of an incoming call by speech. At this time, if the incomingcall signal contains destination information, then the information iscompared with the destination registered in the name vocabulary list. Ifmatching result is output, it is possible to display by speech and onthe screen display more detailed information to the user about “a callfrom Mr. au ”, etc.

Additionally, the destination information can be stored in memory, andcan be announced “The phone number is to be recorded?”, etc., the wordsrelating to the speech instruction registered in advance such as “newregistration”, “added registration”, etc. are instructed to be uttered,and new destination data is registered by speech in the name vocabularylist.

In step S903, a call receiving operation vocabulary list relating to theresponse to an incoming call is read to the speech instructionrecognition circuit 106 as a registered vocabulary list. Then, the LCDdisplay unit 109 displays a message requesting to utter a wordindicating off-hook, or a word indicating on-hook. In addition, thespeech detected by the microphone 103 is read, and the speechinstruction recognition circuit 106 recognizes whether or not the wordindicating the off-hook which is a registered word contained in thespeech is contained. Then, it is determined whether of not the speechdetected by the microphone 103 contains a word indicating the off-hookwhich is a registered word is contained, or whether or not a wordindicating on-hook is contained. If a word indicating off-hook iscontained (YES in step S903′), control is passed to step S904. If a wordindicating on-hook is contained (NO in step S903″), then control ispassed to step S905. That is, the speech instruction recognition circuit106 reads the call receiving operation vocabulary list relating to theresponse when an incoming call is received, and the user determineswhether or not the call is to be answered depending on the situation.When the call is answered, a word indicating off-hook and registered inadvance, for example, a word “answer the phone” is uttered. If it isdetermined by the speech instruction recognition circuit whether or notthe speech input through the microphone 103 is “answer the phone”.

In step S904, the AT command indicating off-hook is called from thespeech instruction information memory 107, and the AT command istransmitted to the speech communications unit 102. That is, when therecognition result “answer the phone” is obtained, the AT command (ATA)indicating the off-hook is transmitted from the central control circuit108 to the speech communications unit, the communications mode isentered, and the speech communications are performed using themicrophone 2 and the speaker 2.

On the other hand, in step S 905, the AT command indicating on-hook iscalled from the speech instruction information memory 107, and the ATcommand is transmitted to the speech communications unit 102. That is,when the user does not want to answer the call, a word indicating a linedisconnection and registered in advance, for example, “disconnect theline” is to be uttered. It is recognized and determined by the speechinstruction recognition circuit as to whether or not the speech inputthrough the microphone 103 is “disconnect the line”. If the recognitionresult of “disconnect the line” is obtained, then the AT command (ATH)indicating a line disconnection is transmitted from the central controlcircuit to the speech communications unit, thereby disconnecting theincoming call signal.

When the frequency of ringing reaches a predetermined value by theinitialization of the speech recognition unit, a control code ofoff-hook is automatically issued, or a control code for an answeringphone mode is issued. Thus, a user-requested mode can be entered.

In a series of speech recognizing operations described above, thetelephone communication terminal having the speech recognizing functionaccording to the present invention has the speech instructionrecognition circuit 106 in which speech detection algorithm (VAD)constantly operates regardless of the presence/absence of speech input.Based on the VAD, the determination is repeated as to whether all soundsincluding the noise input through the microphone 103 refer to anon-input status, a speech-being-input status, or aspeech-completely-input status.

Since the speech instruction recognition circuit 106 constantly operatesthe speech recognition algorithm, unnecessary sounds and words forspeech recognition can be easily input. Therefore, there are rejectionfunctions to avoid malfunctions by correctly recognizing theseunnecessary word words and sounds. A method for recognizing unnecessaryword words can be a garbage model method, etc. proposed by H. Boulard,B. Dhoore and J. M. Boite, “Optimizing Recognition and RejectionPerformance in Wordspotting Systems,” Proc. ICASSP, Adelaide, Australia,pp.I-373-376, 1994, etc.

As shown in FIG. 28, depending on the three status of the internalprocess of the VAD, that is, when speech is in a non-input status, atiming notice image 30 is expressed in green, when the speech is in aspeech-being-input status, it is expressed in yellow, and when thespeech is in a speech-completely-input status, it is expressed in red.The timing notice image 30 is displayed at the upper portion of the LCDdisplay unit 109. Simultaneously, a level meter 31 is displayed on theright end of the LCD display unit 109. The level meter 31 extendsupwards depending on the size of the speech detected by the microphone103. That is, the value of the level meter 31 grows with the volume ofthe speech. Then, the three status of the internal process of theabove-mentioned VAD, that is, the timing notice image 30, is displayedon the LCD display device 62 of the speech recognition unit 101, and thetiming of the start of the utterance is announced to the user. As aresult, unnecessary sounds and words can be discriminated from thenecessary utterance, and the level of the speech detected by themicrophone 103 can be announced by the level meter 31. Thus, the usercan be supported by an appropriate level of the volume. As a result, aregistered word can be more easily recognized.

According to the present invention, the microphone 103 and the speaker113 of the speech recognition unit 101, the microphone 115 and thespeaker 116 of the speech communications unit 102 correspond to thespeech input/output means, the speech indication recognition circuit 106corresponds to the speech recognition means, the speech instructioninformation memory 107 corresponds to the storage means, the LCD displayunit 109 corresponds to the screen display means, the central controlcircuit 108 corresponds to the control means, the microphone 103corresponds to the speech detection means, the timing notice image 30corresponds to the utterance timing notice means, and the level meter 31corresponds to the volume notice means.

The above-mentioned embodiments are only the examples of the speechrecognition method, the remote controller, the information terminal, thetelephone communication terminal, and the speech recognizer according tothe present invention, and do not limit the configuration, etc. of theapparatus.

For example, in the above-mentioned embodiments, the remote controller,the information terminal, and the telephone communication terminal areindividually formed, but they are not limited to these applications. Forexample, the remote controller body 1 according to the first embodimentor the telephone communication terminal according to the thirdembodiment of the present invention can be provided with thecommunications unit 52 according to the second embodiment so that theremote controller body 1 can perform the electronic mail transmittingand receiving function, the schedule managing function, the speech memoprocessing function, the speech timer function, etc. based on the speechrecognition result. With the configuration, as in the second embodiment,a user can use each function only by uttering a registered word withoutphysical operations.

Furthermore, the remote controller body 1 according to the firstembodiment is provided with the speech communications unit 102 accordingto the third embodiment to allow the remote controller body 1 to performspeech recognition, and the telephone operation can be performed basedon the speech recognition result. Thus, as in the third embodiment,although a user is communicating with a partner and the microphone 115and the speaker 115 of the speech communications unit 102 are occupiedby the communications, speech can be input to the remote controller body1, and the speech communications unit 102 can be controlled.

Furthermore, the remote controller body 1 of the first embodiment can beprovided with the communications unit 52 according to the secondembodiment and the speech communications unit 102 according to the thirdembodiment so that the remote controller body 1 can perform speechrecognition. Based on the speech recognition result, the telephoneoperation can be performed. Additionally, based on the speechrecognition result, the electronic mail transmitting and receivingfunction, the schedule managing function, the speech memo processingfunction, the speech timer function, etc. can be performed. With theconfiguration, as in the second embodiment, the user can use eachfunction only by uttering a registered word without any physicaloperation. Furthermore, as in the third embodiment, although a user iscommunicating with a partner, and the microphone 115 and the speaker 115of the speech communications unit 102 are occupied by thecommunications, speech can be input to the remote controller body 1, andthe speech communications unit 102 can be controlled.

INDUSTRIAL APPLICABILITY

As described above, the speech recognition method according to thepresent invention calculates also the likelihood of the speech unitlabel series for an unnecessary word other than the registered word inthe comparing process using the Viterbi algorithm. If noise caused onnormal living conditions, etc. containing no registered words, that is,the speech other than a registered word, is converted into an acousticparameter series, then the likelihood of the acoustic modelcorresponding to the speech unit label series about the unnecessary wordis calculated with a large resultant value. Based on the likelihood, thespeech other than the registered word can be recognized as anunnecessary word, thereby preventing the speech other than theregistered word from being misrecognized as a registered word.

Furthermore, since the remote controller according to the presentinvention recognizes a word to be recognized contained in the speech ofa user in the speech recognition method, the utterance other than theword to be recognized or noise, that is, noise caused on normal livingconditions can be assigned a high rejection rate. Thus, a malfunctionand misrecognition can be avoided.

Additionally, the information terminal according to the presentinvention recognizes a registered word contained in the speech of a userin the speech recognition method. Therefore, when speech such as noisecaused on normal living conditions, etc. containing no registered word,that is, speech other than a registered word, is uttered by a user, thelikelihood of the acoustic model corresponding to the speech unit labelseries about an unnecessary word is calculated large for the acousticparameter series of the speech. Based on the likelihood, the speechother than the registered word can be recognized as an unnecessary word,thereby preventing the speech other than the registered word from beingmisrecognized as a registered word, and avoiding a malfunction of theinformation terminal.

The telephone communication terminal according to the present inventioncan constantly perform speech recognition. When a call is issued,misrecognition can be reduced with either a keyword representing a phonenumber or an arbitrary phone number uttered. When a phone number itselfis recognized, utterance can be recognized digit by digit withoutlimiting the utterance of a caller in a continuous utterance of numbers.On the receiving side, an off-hook operation can be performed usingspeech input. Therefore, telephone operations can be performed handsfreein transmitting and receiving a call. That is, since the communicationsunit and the speech recognition unit has respective and independentinput/output systems of communications unit, the speech of a user can beinput to the speech recognition unit although the user is communicatingwith a partner, and the input/output systems of the communications unitare occupied by the communications, and the communications unit can becontrolled.

Since the speech recognizer according to the present invention notifiesthat it is in a state of recognizing a registered word, a user can uttera registered word with an appropriate timing and the registered word canbe easily recognized.

Furthermore, since the speech recognizing process similar to that in thefirst embodiment is used, when speech other than a registered word isuttered from a user as in the first embodiment, the likelihood of theunnecessary word model 23 is calculated large while the likelihood ofthe vocabulary network 22 of registered words is calculated small. Basedon the likelihoods, the speech other than the registered word isrecognized as an unnecessary word, the speech other than the registeredword is prevented from being misrecognized as a registered word, and amalfunction of the telephone communication terminal can be avoided.

1. A speech recognition method which performs speech recognition byconverting input speech of a target person whose speech is to berecognized into an acoustic parameter series, and comparing using aViterbi algorithm the acoustic parameter series with an acoustic modelcorresponding to a speech unit label series about a registered word,comprising parallel to a speech unit label series for the registeredword a speech unit label series for recognition of an unnecessary wordother than a registered word, in which also a likelihood of the speechunit label series is calculated for an unnecessary word other than theregistered word in the comparing process using the Viterbi algorithm,thereby successfully recognizing the unnecessary word as an unnecessaryword when the necessary word is input as input speech, characterized inthat said acoustic model corresponding to the speech unit label seriesis an acoustic model using a hidden Markov model, and the speech unitlabel series for recognition of the unnecessary word is a virtual speechunit model obtained by equalizing all available speech unit models.
 2. Aspeech recognition method which performs speech recognition byconverting input speech of a target person whose speech is to berecognized into an acoustic parameter series, and comparing using aViterbi algorithm the acoustic parameter series with an acoustic modelcorresponding to a speech unit label series about a registered word,comprising parallel to a speech unit label series for the registeredword a speech unit label series for recognition of an unnecessary wordother than a registered word, in which also a likelihood of the speechunit label series is calculated for an unnecessary word other than theregistered word in the comparing process using the Viterbi algorithm,thereby successfully recognizing the unnecessary word as an unnecessaryword when it is input as input speech, characterized in that saidacoustic model corresponding to the speech unit label series is anacoustic model using a hidden Markov model, and the speech unit labelseries for recognition of the unnecessary word configures a self-loopfrom an end point to a starting point of a set of phoneme modelscorresponding to only the phonemes of vowels.
 3. A speech recognitionmethod which performs speech recognition by converting input speech of atarget person whose speech is to be recognized into an acousticparameter series, and comparing using a Viterbi algorithm the acousticparameter series with an acoustic model corresponding to a speech unitlabel series about a registered word, comprising parallel to a speechunit label series for the registered word a speech unit label series forrecognition of an unnecessary word other than a registered word, inwhich also a likelihood of the speech unit label series is calculatedfor an unnecessary word other than the registered word in the comparingprocess using the Viterbi algorithm, thereby successfully recognizingthe unnecessary word as an unnecessary word when it is input as inputspeech, characterized in that said acoustic model corresponding to thespeech unit label series is an acoustic model using a hidden Markovmodel, and the speech unit label series for recognition of theunnecessary word is a virtual speech unit model obtained by equalizingall available speech unit models provided parallel to a phoneme modelconfigured as a self-loop network of only the phonemes of vowels.
 4. Aremote controller which remotely controls by speech a plurality ofoperation targets, comprising: storage means for storing a word to berecognized indicating a remote operation; means for inputting speechuttered by a user; speech recognition means for recognizing the word tobe recognized and contained in the speech uttered by a user using thestorage means; and transmission means for transmitting an equipmentcontrol signal corresponding to a word to be recognized and actuallyrecognized by the speech recognition means, characterized in that thespeech recognition method is based on the speech recognition methodaccording to any of claims 1 to
 3. 5. The remote controller according toclaim 4, further comprising: a speech input unit for allowing a user toperform communications; and a communications unit for controlling thesetting state to the communications line based on the word to berecognized by the speech recognition means, characterized in that thespeech input means and the speech input unit of the communications unitcan be separately provided.
 6. The remote controller according to claim5, further comprising control means for performing at least one of aprocess of transmitting and receiving mail by speech, a process ofmanaging a schedule by speech, a memo processing by speech, and anotifying process by speech.
 7. An information terminal, comprising:speech detection means for detecting speech of a user; speechrecognition means for recognizing a registered word contained in thespeech detected by the speech detection means; and control means forperforming at least one of the speech recognizing process, the processof managing a schedule by speech, the memo processing by speech, and thenotifying process by speech based on the registered word recognized bythe speech recognition means, characterized in that the speechrecognition means can recognize a registered word contained in thespeech detected by the speech detection means in the speech recognitionmethod according to any of claims 1 to
 3. 8. A telephone communicationterminal which can be connected to a public telephone line network or anInternet communications network, comprising: speech input/output meansfor inputting and outputting speech; speech recognition means forrecognizing input speech; storage means for storing personal informationincluding the name and phone number of a communication partner; screendisplay means; and control means for controlling each means,characterized in that the speech input/output means has respective andindependent input/output systems in the communications unit and thespeech recognition unit.
 9. A telephone communication terminal which canbe connected to a public telephone line network or an Internetcommunications network, comprising: speech input/output means forinputting and outputting speech; speech recognition means forrecognizing input speech; storage means for storing personal informationincluding the name and phone number of a communication partner; screendisplay means; and control means for controlling each means,characterized in that the storage means separately stores a namevocabulary list of specific names including the name of a personregistered in advance; a number vocabulary list of arbitrary phonenumbers; a telephone call operation vocabulary list of telephoneoperations during communications; and a call receiving operationvocabulary list of telephone operations for an incoming call, and alltelephone operations relating to an outgoing call, a disconnection, andan incoming call can be performed by the speech recognition means, thestorage means, and the control means by input of speech.
 10. Thetelephone communication terminal according to claim 8 or 9,characterized in that a method of recognizing a phone number can also berealized by recognizing a number string pattern formed by apredetermined number of digits or symbols using a number vocabulary listof the storage means and the phone number vocabulary network forrecognition of an arbitrary phone number by the speech recognition meansby inputting all number of digits of continuous utterance.
 11. Thetelephone communication terminal according to claim 8, characterized inthat the screen display means can have an utterance timing displayfunction of announcing an utterance timing.
 12. The telephonecommunication terminal according to claim 8, further comprising secondcontrol means for performing at least one of the process of transmittingand receiving mail by speech, the process of managing a schedule byspeech, the memo processing by speech, and the notifying process byspeech based on the input speech recognized by the speech recognitionmeans.
 13. The telephone communication terminal according to claim 8,characterized in that the speech recognition means recognizes aregistered word contained in input speech in the speech recognitionmethod according to claim
 1. 14. (Cancelled)
 15. (Cancelled)